Mastering Python Generators
DataScience4Of course! Let's master Python generators. This guide will take you from the absolute basics to advanced, practical applications with 20 distinct examples.
🔰 Mastering Python Generators
What is a Generator?
A generator is a special type of iterator, a simpler way to create iterators. Instead of building and returning a whole list of items at once, a generator yields items one by one, on demand.
Key Benefits:
• Memory Efficient: Generators don't store all values in memory. They generate them on the fly, making them perfect for working with very large data sets or infinite sequences.
• Lazy Evaluation: The next value is only computed when you ask for it. This can save significant processing time if you only need a few items from a potentially long sequence.
• Elegant Code: They can make your code for complex iteration tasks cleaner and more readable.
Core Concepts: yield vs. return
A regular function uses return to send back a value and then exits completely. A generator function uses yield.
• yield: Pauses the function, saves its state, and hands a value back to the caller. When the caller asks for the next value, the function resumes right where it left off.
---
Part 1: The Basics (Examples 1-5)
These examples illustrate the fundamental mechanics of generators.
#### 1. A Simple Counter
The "Hello, World!" of generators. It yields numbers up to a specified limit.
def count_up_to(max_num):
count = 1
while count <= max_num:
yield count
count += 1
# Usage
counter = count_up_to(5)
print(next(counter)) # Output: 1
print(next(counter)) # Output: 2
for number in counter:
print(number) # Output: 3, 4, 5#### 2. Generator Expression
A concise, inline way to create a generator, similar to a list comprehension but with parentheses () instead of square brackets [].
# List comprehension (builds a full list in memory)
list_comp = [x * x for x in range(1, 6)]
# Generator expression (yields values one by one)
gen_expr = (x * x for x in range(1, 6))
# Usage
print(f"List: {list_comp}") # Output: List: [1, 4, 9, 16, 25]
print(f"Generator object: {gen_expr}") # Output: Generator object: <generator object <genexpr> at ...>
print("Values from generator:")
for val in gen_expr:
print(val, end=' ') # Output: 1 4 9 16 25#### 3. Powers of Two
A generator that yields successive powers of two.
def powers_of_two(n):
power = 1
for _ in range(n):
yield power
power *= 2
# Usage
for num in powers_of_two(5):
print(num) # Output: 1, 2, 4, 8, 16#### 4. Random Number Generator
Yield a specified number of random integers within a range.
import random
def random_numbers(count, min_val, max_val):
for _ in range(count):
yield random.randint(min_val, max_val)
# Usage
print("3 random numbers between 1 and 100:")
for num in random_numbers(3, 1, 100):
print(num)#### 5. A Simple Countdown
A generator that counts down from a starting number.
import time
def countdown(start):
print("Starting countdown...")
while start > 0:
yield start
start -= 1
time.sleep(0.5)
print("Blast off!")
# Usage
for i in countdown(3):
print(i)
# Output:
# Starting countdown...
# 3
# 2
# 1
# Blast off!---
Part 2: Working with Data & Files (Examples 6-9)
This is where generators shine, by processing large files without loading them into memory.
#### 6. Reading a Large File Line by Line
The most common and important use case. This reads a file of any size without consuming your RAM.
def read_large_file(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line.strip()
# Usage (imagine 'my_huge_log.txt' is gigabytes in size)
# for log_entry in read_large_file('my_huge_log.txt'):
# if "ERROR" in log_entry:
# print(log_entry)#### 7. CSV Reader Generator
Parses a CSV file and yields each row as a dictionary.
import csv
def csv_reader(file_path):
with open(file_path, 'r') as f:
# The csv.DictReader is already a generator-like object!
# We wrap it in our own generator for demonstration or added logic.
reader = csv.DictReader(f)
for row in reader:
yield row
# Usage (assuming 'data.csv' exists)
# for row_dict in csv_reader('data.csv'):
# print(f"Name: {row_dict['name']}, Age: {row_dict['age']}")#### 8. Data Processing Pipeline
Chain generators together to create an efficient data processing pipeline.
# Generator 1: Read the file
def read_log(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line
# Generator 2: Filter for specific lines
def filter_lines(sequence, phrase):
for line in sequence:
if phrase in line:
yield line
# Generator 3: Extract information
def extract_ip(sequence):
for line in sequence:
# Simple example: assumes IP is the first word
yield line.split()[0]
# Usage
# log_lines = read_log('access.log')
# error_lines = filter_lines(log_lines, '404')
# ip_addresses = extract_ip(error_lines)
# for ip in ip_addresses:
# print(ip)#### 9. Generating Chunks of a List
Yield successive n-sized chunks from a list. Useful for batch processing.
def get_chunks(data, chunk_size):
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]
# Usage
my_list = list(range(25))
for chunk in get_chunks(my_list, 10):
print(chunk)
# Output:
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
# [20, 21, 22, 23, 24]---
Part 3: Infinite Sequences (Examples 10-12)
Things that can't be stored in a list can be represented by a generator.
#### 10. Infinite Fibonacci Sequence
Generates Fibonacci numbers forever. You must use a break or other logic to stop it.
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
# Usage
fib_gen = fibonacci()
print("First 10 Fibonacci numbers:")
for i in range(10):
print(next(fib_gen), end=' ') # Output: 0 1 1 2 3 5 8 13 21 34#### 11. Infinite Sequence of Prime Numbers
A more complex algorithm that yields prime numbers endlessly.
def primes():
num = 2
while True:
is_prime = True
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
is_prime = False
break
if is_prime:
yield num
num += 1
# Usage
prime_gen = primes()
print("\nFirst 5 prime numbers:")
for _ in range(5):
print(next(prime_gen)) # Output: 2, 3, 5, 7, 11#### 12. Cycling Through an Iterable
A generator that repeats a sequence indefinitely.
def cycle_iterable(iterable):
while True:
for item in iterable:
yield item
# Usage
traffic_light = cycle_iterable(['Red', 'Yellow', 'Green'])
print("Traffic light sequence:")
for _ in range(7):
print(next(traffic_light))---
Part 4: Advanced & Algorithmic (Examples 13-20)
These examples showcase more complex patterns and use cases.
#### 13. yield from for Chaining Generatorsyield from is a convenient syntax to delegate part of a generator's operations to another generator.
def generator_one():
yield "A"
yield "B"
def generator_two():
yield "C"
yield "D"
def chain_generators(*gens):
for gen in gens:
yield from gen
# Usage
chained = chain_generators(generator_one(), generator_two())
for item in chained:
print(item, end='') # Output: ABCD#### 14. Flatten a Nested List
Use recursion and yield from to elegantly flatten a list of any depth.
def flatten(nested_list):
for item in nested_list:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
# Usage
my_list = [1, [2, 3], [4, [5, 6]], 7]
print("\nFlattened list:")
for item in flatten(my_list):
print(item, end=' ') # Output: 1 2 3 4 5 6 7#### 15. Tree Traversal (Pre-order)
Generators are perfect for traversing complex data structures like trees without building a result list.
class Node:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def preorder_traversal(node):
if node:
yield node.value
yield from preorder_traversal(node.left)
yield from preorder_traversal(node.right)
# Usage
# 1
# / \
# 2 3
# /
# 4
tree = Node(1, Node(2, Node(4)), Node(3))
print("\nPre-order traversal:")
for val in preorder_traversal(tree):
print(val, end=' ') # Output: 1 2 4 3#### 16. Coroutine with .send()
Generators can also receive values using the send() method. This makes them powerful building blocks for coroutines (asynchronous tasks).
def running_average():
total = 0.0
count = 0
average = None
while True:
# Yields the current average and waits for a new value to be sent
new_value = yield average
total += new_value
count += 1
average = total / count
# Usage
averager = running_average()
next(averager) # Prime the coroutine (runs until the first yield)
print(f"Sending 10, Average: {averager.send(10)}") # Output: 10.0
print(f"Sending 20, Average: {averager.send(20)}") # Output: 15.0
print(f"Sending 3, Average: {averager.send(3)}") # Output: 11.0#### 17. Generating Date Ranges
Like range(), but for dates.
from datetime import date, timedelta
def date_range(start_date, end_date):
current_date = start_date
while current_date <= end_date:
yield current_date
current_date += timedelta(days=1)
# Usage
start = date(2023, 12, 28)
end = date(2024, 1, 2)
print("\nDate Range:")
for d in date_range(start, end):
print(d)#### 18. File Finder Generator
A generator that walks a directory tree and yields files matching a pattern, like the find command.
import os
def find_files(root_dir, pattern):
for dirpath, _, filenames in os.walk(root_dir):
for filename in filenames:
if pattern in filename:
yield os.path.join(dirpath, filename)
# Usage: Find all Python files in the current directory and subdirectories
# for py_file in find_files('.', '.py'):
# print(py_file)#### 19. Sentence Tokenizer
Yields sentences one by one from a large block of text, avoiding loading the whole text if it were streamed.
import re
def get_sentences(text):
# A simple regex to split by sentence-ending punctuation
sentences = re.split(r'(?<!\w\.\w.)(?<
![A-Z][a-z]\.)
(?<=\.|\?|\!)\s', text)
for sentence in sentences:
if sentence:
yield sentence.strip()
# Usage
long_text = "This is the first sentence. Here is another one! And what about a third?"
for s in get_sentences(long_text):
print(s)#### 20. Moving Window / Sliding Window
Yield a "window" of a certain size as it slides over an iterable. Very common in signal processing and data analysis.
from collections import deque
def moving_window(iterable, size):
window = deque(maxlen=size)
for item in iterable:
window.append(item)
if len(window) == size:
yield list(window)
# Usage
data = [1, 2, 3, 4, 5, 6, 7]
print("\nMoving window of size 3:")
for window_slice in moving_window(data, 3):
print(window_slice)
# Output:
# [1, 2, 3]
# [2, 3, 4]
# [3, 4, 5]
# [4, 5, 6]
# [5, 6, 7]