Generators in Python are a great way to handle large datasets efficiently without loading everything into memory. They allow you to iterate over data one item at a time, keeping memory usage low, which is especially useful when dealing with large files or datasets.
Here are 10 Python code snippets demonstrating various ways of handling large data with generators:
1. Basic Generator for Large Data
deflarge_range(start,end):for number inrange(start, end):yield number# Example usagefor num inlarge_range(1,1000000):if num >10:breakprint(num)
This generator yields numbers in a range without creating the entire range in memory.
2. Reading Large Files Line by Line Using a Generator
defread_large_file(file_path):withopen(file_path,'r')asfile:for line infile:yield line.strip()# Example usagefor line inread_large_file('large_file.txt'):if'keyword'in line:print(line)
This example reads a file line by line, yielding one line at a time, which is memory efficient for large files.
3. Using yield to Simulate a Chunked File Reader
This generator reads the file in chunks, allowing you to process large binary files in parts.
4. Filtering Data with Generators
This generator filters out even numbers from a large dataset.
5. Generating Infinite Sequences
This example demonstrates an infinite generator that keeps yielding numbers indefinitely.
6. Working with Large Data from an API (Mocked Example)
This generator streams data from an API, yielding each line without storing the entire response in memory.
7. Processing Large Logs with a Generator
This generator processes each log line, splitting it into parts, without storing the entire log file in memory.
8. Creating a Generator to Calculate Large Fibonacci Sequences
This generator produces an infinite Fibonacci sequence, one number at a time.
9. Using Generators for Lazy Data Transformation
This generator transforms data lazily, converting each string to uppercase as it's processed.
10. Generator with itertools for Efficient Data Processing
Using itertools.islice, we can efficiently work with large ranges and take slices from a generator.
These examples demonstrate how to use generators to handle large datasets in an efficient, memory-friendly manner. Whether you're reading large files, processing data from APIs, or working with infinite sequences, generators allow you to process data one item at a time, significantly reducing memory usage.
def chunked_reader(file_path, chunk_size=1024):
with open(file_path, 'rb') as file:
while chunk := file.read(chunk_size):
yield chunk
# Example usage
for chunk in chunked_reader('large_binary_file.bin'):
process_chunk(chunk) # Process each chunk without loading the entire file into memory
def filter_even_numbers(numbers):
for num in numbers:
if num % 2 == 0:
yield num
# Example usage
large_data = range(1, 1000000)
for even_num in filter_even_numbers(large_data):
print(even_num)
def infinite_counter(start=0):
count = start
while True:
yield count
count += 1
# Example usage
counter = infinite_counter()
for _ in range(10):
print(next(counter))
import requests
def get_large_data_from_api(url):
response = requests.get(url, stream=True)
for line in response.iter_lines():
if line:
yield line
# Example usage
for data in get_large_data_from_api('https://example.com/large_data.txt'):
print(data)
def process_logs(file_path):
with open(file_path, 'r') as file:
for line in file:
# Simulate processing each log line
yield line.strip().split(' - ')
# Example usage
for log in process_logs('large_logs.txt'):
timestamp, message = log
print(f"Timestamp: {timestamp}, Message: {message}")
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Example usage
fib_gen = fibonacci()
for _ in range(10):
print(next(fib_gen))
def transform_data(data):
for item in data:
yield item.upper()
# Example usage
large_data = ["apple", "banana", "cherry"]
for transformed_item in transform_data(large_data):
print(transformed_item)
import itertools
def get_large_range(start, end, step=1):
return (x for x in range(start, end, step))
# Example usage
for number in itertools.islice(get_large_range(1, 1000000, 2), 10):
print(number)