Regular Expression
- Get link
- X
- Other Apps
Regular Expression?
A regular expression (regex) is a special string used to describe a pattern for searching or matching text.
Think of it as a search formula for strings.
📦 The re
Module
Python has a built-in module called re
for working with regular expressions.
✅ Importing:
import re
Common Functions in re Module
Function Description re.match()
Matches pattern at the beginning of string re.search()
Searches pattern anywhere in string re.findall()
Returns all matching substrings re.finditer()
Returns iterator of matches re.sub()
Replaces pattern with another string re.split()
Splits string based on pattern re.compile()
Compiles a regex pattern into an object
Basic Pattern Examples
Pattern Meaning .
Any character except newline ^
Start of string $
End of string *
0 or more times +
1 or more times ?
0 or 1 time {m}
Exactly m times {m,n}
Between m and n times [...]
Match any one of the characters \d
Digit (0–9) \D
Not a digit \w
Word character (a-z, A-Z, 0-9, _) \W
Not a word character \s
Whitespace \S
Not whitespace
Basic Pattern Examples
Pattern Meaning .
Any character except newline ^
Start of string $
End of string *
0 or more times +
1 or more times ?
0 or 1 time {m}
Exactly m times {m,n}
Between m and n times [...]
Match any one of the characters \d
Digit (0–9) \D
Not a digit \w
Word character (a-z, A-Z, 0-9, _) \W
Not a word character \s
Whitespace \S
Not whitespace
✅ Examples
1. re.match()
import re
result = re.match("Hello", "Hello World")
print(result.group()) # Output: Hello
2. re.search()
result = re.search("World", "Hello World")
print(result.group()) # Output: World
3. re.findall()
text = "My number is 9876543210 and 1234567890"
numbers = re.findall(r'\d{10}', text)
print(numbers) # Output: ['9876543210', '1234567890']
4. re.sub()
text = "abc abc abc"
replaced = re.sub("abc", "xyz", text)
print(replaced) # Output: xyz xyz xyz
5. re.split()
text = "apple,banana,grapes"
fruits = re.split(",", text)
print(fruits) # Output: ['apple', 'banana', 'grapes']
🎯 Using re.compile()
for Better Reuse
pattern = re.compile(r'\d+')
result = pattern.findall("There are 2 apples and 10 bananas")
print(result) # Output: ['2', '10']
📌 Practical Example: Email Validation
email = "user@example.com"
if re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
print("Valid email")
else:
print("Invalid email")
Best Practices
Use raw strings r"pattern"
to avoid escape issues.
Test regex patterns using online tools like regex101.com.
Use compile()
when using a pattern multiple times.
📚 Summary Table
Function Use re.match()
Checks beginning of a string re.search()
Searches for pattern re.findall()
Returns all matches re.sub()
Replace substrings re.split()
Split by pattern re.compile()
Precompile pattern
Write a function using regex to check if a password meets the following rules: minimum 8 characters, includes at least one uppercase letter, onenumber, and one special character. ( university question)
import re
def is_valid_password(password):
# Check minimum length
if len(password) < 8:
return False
# Regex checks
has_uppercase = re.search(r'[A-Z]', password)
has_digit = re.search(r'\d', password)
has_special = re.search(r'[!@#$%^&*(),.?":{}|<>]', password)
if has_uppercase and has_digit and has_special:
return True
else:
return False
# Test cases
passwords = [
"Password1!", # Valid
"password1!", # Missing uppercase
"Password!", # Missing number
"Password1", # Missing special character
"Pass1!", # Less than 8 characters
]
for pwd in passwords:
print(f"{pwd}: {'Valid' if is_valid_password(pwd) else 'Invalid'}")
OutputPassword1!: Valid
password1!: Invalid
Password!: Invalid
Password1: Invalid
Pass1!: Invalid
From a multi-line string containing log entries like "User: John, ID: 001", extract all user namesusing regular expressions. Convert the extracted names into a NumPy array. Assume you now want to sort them alphabetically and count how many start with each letter. Write code to do this and return the results. ( University Question)
Below is a step-by-step solution to:
- Extract all user names from a multiline log string using regex
- Convert them into a NumPy array
- Sort the names alphabetically
- Count how many names start with each letter
import re
import numpy as np
from collections import Counter
# Multiline log string
log_data = """
User: John, ID: 001
User: Alice, ID: 002
User: Bob, ID: 003
User: David, ID: 004
User: Charlie, ID: 005
User: Anna, ID: 006
"""
# Step 1: Extract all user names using regex
user_names = re.findall(r'User:\s*(\w+)', log_data)
print("Extracted names:", user_names)
# Step 2: Convert to NumPy array
names_array = np.array(user_names)
# Step 3: Sort the array alphabetically
sorted_names = np.sort(names_array)
print("Sorted names:", sorted_names)
# Step 4: Count how many names start with each letter
first_letters = [name[0].upper() for name in sorted_names]
letter_counts = Counter(first_letters)
# Convert to a sorted dictionary for neat output
sorted_counts = dict(sorted(letter_counts.items()))
# Final Output
print("Counts by starting letter:", sorted_counts)
OutputExtracted names: ['John', 'Alice', 'Bob', 'David', 'Charlie', 'Anna'] Sorted names: ['Alice' 'Anna' 'Bob' 'Charlie' 'David' 'John'] Counts by starting letter: {'A': 2, 'B': 1, 'C': 1, 'D': 1, 'J': 1}
- Get link
- X
- Other Apps
Comments
Post a Comment