Regular Expression

 

Regular Expression?

A regular expression (regex) is a special string used to describe a pattern for searching or matching text.

Think of it as a search formula for strings.

📦 The re Module

Python has a built-in module called re for working with regular expressions.

✅ Importing:

import re

Common Functions in re Module

FunctionDescription
re.match()Matches pattern at the beginning of string
re.search()Searches pattern anywhere in string
re.findall()Returns all matching substrings
re.finditer()Returns iterator of matches
re.sub()Replaces pattern with another string
re.split()Splits string based on pattern
re.compile()Compiles a regex pattern into an object

Basic Pattern Examples

PatternMeaning
.Any character except newline
^Start of string
$End of string
*0 or more times
+1 or more times
?0 or 1 time
{m}Exactly m times
{m,n}Between m and n times
[...]Match any one of the characters
\dDigit (0–9)
\DNot a digit
\wWord character (a-z, A-Z, 0-9, _)
\WNot a word character
\sWhitespace
\SNot whitespace

Basic Pattern Examples

PatternMeaning
.Any character except newline
^Start of string
$End of string
*0 or more times
+1 or more times
?0 or 1 time
{m}Exactly m times
{m,n}Between m and n times
[...]Match any one of the characters
\dDigit (0–9)
\DNot a digit
\wWord character (a-z, A-Z, 0-9, _)
\WNot a word character
\sWhitespace
\SNot whitespace

✅ Examples

1. re.match()

import re result = re.match("Hello", "Hello World") print(result.group()) # Output: Hello

2. re.search()

result = re.search("World", "Hello World") print(result.group()) # Output: World

3. re.findall()

text = "My number is 9876543210 and 1234567890" numbers = re.findall(r'\d{10}', text) print(numbers) # Output: ['9876543210', '1234567890']

4. re.sub()

text = "abc abc abc" replaced = re.sub("abc", "xyz", text) print(replaced) # Output: xyz xyz xyz

5. re.split()

text = "apple,banana,grapes" fruits = re.split(",", text) print(fruits) # Output: ['apple', 'banana', 'grapes']

🎯 Using re.compile() for Better Reuse

pattern = re.compile(r'\d+') result = pattern.findall("There are 2 apples and 10 bananas") print(result) # Output: ['2', '10']

📌 Practical Example: Email Validation

email = "user@example.com" if re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email): print("Valid email") else: print("Invalid email")

Best Practices

  • Use raw strings r"pattern" to avoid escape issues.

  • Test regex patterns using online tools like regex101.com.

  • Use compile() when using a pattern multiple times.


📚 Summary Table

FunctionUse
re.match()    Checks beginning of a string
re.search()    Searches for pattern
re.findall()    Returns all matches
re.sub()    Replace substrings
re.split()    Split by pattern
re.compile()        Precompile pattern
Write a function using regex to check if a password meets the following rules: minimum 8
characters, includes at least one uppercase letter, onenumber, and one special character.
( university question)

import re def is_valid_password(password): # Check minimum length if len(password) < 8: return False # Regex checks has_uppercase = re.search(r'[A-Z]', password) has_digit = re.search(r'\d', password) has_special = re.search(r'[!@#$%^&*(),.?":{}|<>]', password) if has_uppercase and has_digit and has_special: return True else: return False # Test cases passwords = [ "Password1!", # Valid "password1!", # Missing uppercase "Password!", # Missing number "Password1", # Missing special character "Pass1!", # Less than 8 characters ] for pwd in passwords: print(f"{pwd}: {'Valid' if is_valid_password(pwd) else 'Invalid'}")

Output
Password1!: Valid
password1!: Invalid
Password!: Invalid
Password1: Invalid
Pass1!: Invalid

From a multi-line string containing log entries like "User: John, ID: 001", extract all user names
using regular expressions. Convert the extracted names into a NumPy array. Assume you now
want to sort them alphabetically and count how many start with each letter.
Write code to do this and return the results. ( University Question)

Below is a step-by-step solution to:

  • Extract all user names from a multiline log string using regex
  • Convert them into a NumPy array
  • Sort the names alphabetically
  • Count how many names start with each letter
import re import numpy as np from collections import Counter # Multiline log string log_data = """ User: John, ID: 001 User: Alice, ID: 002 User: Bob, ID: 003 User: David, ID: 004 User: Charlie, ID: 005 User: Anna, ID: 006 """ # Step 1: Extract all user names using regex user_names = re.findall(r'User:\s*(\w+)', log_data) print("Extracted names:", user_names) # Step 2: Convert to NumPy array names_array = np.array(user_names) # Step 3: Sort the array alphabetically sorted_names = np.sort(names_array) print("Sorted names:", sorted_names) # Step 4: Count how many names start with each letter first_letters = [name[0].upper() for name in sorted_names] letter_counts = Counter(first_letters) # Convert to a sorted dictionary for neat output sorted_counts = dict(sorted(letter_counts.items())) # Final Output print("Counts by starting letter:", sorted_counts)

Output
Extracted names: ['John', 'Alice', 'Bob', 'David', 'Charlie', 'Anna']
Sorted names: ['Alice' 'Anna' 'Bob' 'Charlie' 'David' 'John']
Counts by starting letter: {'A': 2, 'B': 1, 'C': 1, 'D': 1, 'J': 1}

Comments

Popular posts from this blog

Python For Application Development MNCST 309 KTU BTech CS Minor 2024 Scheme - Dr Binu V P

Course Details MNCST 309 Python For Application Development