65. Regular Expressions with re

Regular Expressions with re: Using the re module for pattern matching and string manipulation

The re module in Python provides a powerful mechanism for pattern matching and string manipulation through regular expressions (regex). Regular expressions allow you to search, match, and manipulate strings in complex ways. Below are various examples demonstrating how to use the re module effectively.

1. Basic Matching with re.match() and re.search()

re.match(): Tries to match a pattern from the start of the string.
re.search(): Searches for the pattern anywhere in the string.

import re

# re.match() example
result = re.match(r'hello', 'hello world')
if result:
    print("Match found:", result.group())
else:
    print("No match")

# re.search() example
result = re.search(r'world', 'hello world')
if result:
    print("Search found:", result.group())
else:
    print("No search match")

2. Finding All Matches with re.findall()

re.findall() returns a list of all non-overlapping matches in the string.

import re

text = "The quick brown fox jumps over the lazy dog."
matches = re.findall(r'\b\w{3}\b', text)  # Matches words of exactly 3 letters
print(matches)  # Output: ['The', 'fox', 'the', 'dog']

3. Replacing Text with re.sub()

The re.sub() function allows you to replace occurrences of a pattern in a string with a specified replacement.

import re

text = "The price is 100 dollars."
new_text = re.sub(r'\d+', '200', text)  # Replaces digits with '200'
print(new_text)  # Output: The price is 200 dollars.

4. Compiling Regular Expressions with re.compile()

You can compile a regular expression pattern into a regex object, which can be used multiple times for efficiency.

import re

pattern = re.compile(r'\d+')  # Compiled regex pattern
matches = pattern.findall("There are 12 apples and 25 oranges.")
print(matches)  # Output: ['12', '25']

5. Using Groups with re.search()

You can use parentheses () in your regular expressions to create capture groups, which allow you to extract parts of the match.

import re

text = "My name is John and I am 30 years old."
match = re.search(r'(\w+) (\d+)', text)  # Captures name and age
if match:
    print("Name:", match.group(1))  # John
    print("Age:", match.group(2))   # 30

6. Matching at the Start or End of a String

^: Asserts the start of a string.
$: Asserts the end of a string.

import re

text = "hello world"
# Matches 'hello' at the start of the string
if re.match(r'^hello', text):
    print("Starts with 'hello'")

# Matches 'world' at the end of the string
if re.search(r'world$', text):
    print("Ends with 'world'")

7. Using re.split() to Split a String

The re.split() function splits the string based on the given regular expression pattern.

import re

text = "apple,banana,orange"
fruits = re.split(r',', text)  # Split by comma
print(fruits)  # Output: ['apple', 'banana', 'orange']

8. Using Wildcards with .

The dot . wildcard matches any character except a newline.

import re

text = "abc def"
match = re.search(r'a.b', text)
if match:
    print("Pattern found:", match.group())  # Output: 'abc'

9. Using Character Classes

Character classes allow you to match specific types of characters.

\d: Matches any digit.
\w: Matches any alphanumeric character or underscore.
\s: Matches any whitespace character.

import re

text = "123abc def 456"
# Match digits
digits = re.findall(r'\d+', text)
print("Digits:", digits)  # Output: ['123', '456']

# Match word characters (alphanumeric and underscore)
words = re.findall(r'\w+', text)
print("Words:", words)  # Output: ['123abc', 'def', '456']

10. Using Quantifiers

Quantifiers specify how many times a pattern should match. Common quantifiers include:

*: Matches zero or more times.
+: Matches one or more times.
{n}: Matches exactly n times.
{n,}: Matches n or more times.
{n,m}: Matches between n and m times.

import re

text = "aaabbbcccc"

# Matches three 'a's
match = re.search(r'a{3}', text)
if match:
    print("Matched:", match.group())  # Output: 'aaa'

# Matches one or more 'b's
match = re.search(r'b+', text)
if match:
    print("Matched:", match.group())  # Output: 'bbb'

# Matches between 2 and 4 'c's
match = re.search(r'c{2,4}', text)
if match:
    print("Matched:", match.group())  # Output: 'cccc'

11. Anchoring with Word Boundaries \b

The \b symbol matches word boundaries, useful for matching whole words.

import re

text = "hello world, hello"
# Matches 'hello' as a whole word
matches = re.findall(r'\bhello\b', text)
print(matches)  # Output: ['hello', 'hello']

12. Case-Insensitive Matching with re.IGNORECASE

You can perform case-insensitive matching using the re.IGNORECASE flag.

import re

text = "Hello World"
# Case-insensitive matching
match = re.search(r'hello', text, re.IGNORECASE)
if match:
    print("Match found:", match.group())  # Output: 'Hello'

Summary of Key Features:

Basic Functions: re.match(), re.search(), re.findall(), re.sub()
Groups: Capture specific parts of the match using parentheses.
Anchors: Use ^ for start, $ for end, and \b for word boundaries.
Wildcards and Quantifiers: Use . for any character and quantifiers like *, +, {n} to control match repetitions.
Character Classes: Use \d, \w, \s for matching digits, words, and whitespace characters.
Flags: Use re.IGNORECASE for case-insensitive matching.

The re module is a versatile tool that, when used effectively, can significantly simplify string searching and manipulation in Python.

Previous64. Python's unittest Framework Next66. List Comprehensions

Last updated 2 months ago