Mastering Python Generators

Mastering Python Generators

DataScience4

Of course! Let's master Python generators. This guide will take you from the absolute basics to advanced, practical applications with 20 distinct examples.

🔰 Mastering Python Generators

What is a Generator?

A generator is a special type of iterator, a simpler way to create iterators. Instead of building and returning a whole list of items at once, a generator yields items one by one, on demand.

Key Benefits:
Memory Efficient: Generators don't store all values in memory. They generate them on the fly, making them perfect for working with very large data sets or infinite sequences.
Lazy Evaluation: The next value is only computed when you ask for it. This can save significant processing time if you only need a few items from a potentially long sequence.
Elegant Code: They can make your code for complex iteration tasks cleaner and more readable.

Core Concepts: yield vs. return

A regular function uses return to send back a value and then exits completely. A generator function uses yield.

yield: Pauses the function, saves its state, and hands a value back to the caller. When the caller asks for the next value, the function resumes right where it left off.

---

Part 1: The Basics (Examples 1-5)

These examples illustrate the fundamental mechanics of generators.

#### 1. A Simple Counter
The "Hello, World!" of generators. It yields numbers up to a specified limit.

def count_up_to(max_num):
count = 1
while count <= max_num:
yield count
count += 1

# Usage
counter = count_up_to(5)
print(next(counter)) # Output: 1
print(next(counter)) # Output: 2
for number in counter:
print(number) # Output: 3, 4, 5

#### 2. Generator Expression
A concise, inline way to create a generator, similar to a list comprehension but with parentheses () instead of square brackets [].

# List comprehension (builds a full list in memory)
list_comp = [x * x for x in range(1, 6)]

# Generator expression (yields values one by one)
gen_expr = (x * x for x in range(1, 6))

# Usage
print(f"List: {list_comp}") # Output: List: [1, 4, 9, 16, 25]
print(f"Generator object: {gen_expr}") # Output: Generator object: <generator object <genexpr> at ...>

print("Values from generator:")
for val in gen_expr:
print(val, end=' ') # Output: 1 4 9 16 25

#### 3. Powers of Two
A generator that yields successive powers of two.

def powers_of_two(n):
power = 1
for _ in range(n):
yield power
power *= 2

# Usage
for num in powers_of_two(5):
print(num) # Output: 1, 2, 4, 8, 16

#### 4. Random Number Generator
Yield a specified number of random integers within a range.

import random

def random_numbers(count, min_val, max_val):
for _ in range(count):
yield random.randint(min_val, max_val)

# Usage
print("3 random numbers between 1 and 100:")
for num in random_numbers(3, 1, 100):
print(num)

#### 5. A Simple Countdown
A generator that counts down from a starting number.

import time

def countdown(start):
print("Starting countdown...")
while start > 0:
yield start
start -= 1
time.sleep(0.5)
print("Blast off!")

# Usage
for i in countdown(3):
print(i)
# Output:
# Starting countdown...
# 3
# 2
# 1
# Blast off!

---

Part 2: Working with Data & Files (Examples 6-9)

This is where generators shine, by processing large files without loading them into memory.

#### 6. Reading a Large File Line by Line
The most common and important use case. This reads a file of any size without consuming your RAM.

def read_large_file(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line.strip()

# Usage (imagine 'my_huge_log.txt' is gigabytes in size)
# for log_entry in read_large_file('my_huge_log.txt'):
# if "ERROR" in log_entry:
# print(log_entry)

#### 7. CSV Reader Generator
Parses a CSV file and yields each row as a dictionary.

import csv

def csv_reader(file_path):
with open(file_path, 'r') as f:
# The csv.DictReader is already a generator-like object!
# We wrap it in our own generator for demonstration or added logic.
reader = csv.DictReader(f)
for row in reader:
yield row

# Usage (assuming 'data.csv' exists)
# for row_dict in csv_reader('data.csv'):
# print(f"Name: {row_dict['name']}, Age: {row_dict['age']}")

#### 8. Data Processing Pipeline
Chain generators together to create an efficient data processing pipeline.

# Generator 1: Read the file
def read_log(file_path):
with open(file_path, 'r') as f:
for line in f:
yield line

# Generator 2: Filter for specific lines
def filter_lines(sequence, phrase):
for line in sequence:
if phrase in line:
yield line

# Generator 3: Extract information
def extract_ip(sequence):
for line in sequence:
# Simple example: assumes IP is the first word
yield line.split()[0]

# Usage
# log_lines = read_log('access.log')
# error_lines = filter_lines(log_lines, '404')
# ip_addresses = extract_ip(error_lines)

# for ip in ip_addresses:
# print(ip)

#### 9. Generating Chunks of a List
Yield successive n-sized chunks from a list. Useful for batch processing.

def get_chunks(data, chunk_size):
for i in range(0, len(data), chunk_size):
yield data[i:i + chunk_size]

# Usage
my_list = list(range(25))
for chunk in get_chunks(my_list, 10):
print(chunk)
# Output:
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
# [20, 21, 22, 23, 24]

---

Part 3: Infinite Sequences (Examples 10-12)

Things that can't be stored in a list can be represented by a generator.

#### 10. Infinite Fibonacci Sequence
Generates Fibonacci numbers forever. You must use a break or other logic to stop it.

def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b

# Usage
fib_gen = fibonacci()
print("First 10 Fibonacci numbers:")
for i in range(10):
print(next(fib_gen), end=' ') # Output: 0 1 1 2 3 5 8 13 21 34

#### 11. Infinite Sequence of Prime Numbers
A more complex algorithm that yields prime numbers endlessly.

def primes():
num = 2
while True:
is_prime = True
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
is_prime = False
break
if is_prime:
yield num
num += 1

# Usage
prime_gen = primes()
print("\nFirst 5 prime numbers:")
for _ in range(5):
print(next(prime_gen)) # Output: 2, 3, 5, 7, 11

#### 12. Cycling Through an Iterable
A generator that repeats a sequence indefinitely.

def cycle_iterable(iterable):
while True:
for item in iterable:
yield item

# Usage
traffic_light = cycle_iterable(['Red', 'Yellow', 'Green'])
print("Traffic light sequence:")
for _ in range(7):
print(next(traffic_light))

---

Part 4: Advanced & Algorithmic (Examples 13-20)

These examples showcase more complex patterns and use cases.

#### 13. yield from for Chaining Generators
yield from is a convenient syntax to delegate part of a generator's operations to another generator.

def generator_one():
yield "A"
yield "B"

def generator_two():
yield "C"
yield "D"

def chain_generators(*gens):
for gen in gens:
yield from gen

# Usage
chained = chain_generators(generator_one(), generator_two())
for item in chained:
print(item, end='') # Output: ABCD

#### 14. Flatten a Nested List
Use recursion and yield from to elegantly flatten a list of any depth.

def flatten(nested_list):
for item in nested_list:
if isinstance(item, list):
yield from flatten(item)
else:
yield item

# Usage
my_list = [1, [2, 3], [4, [5, 6]], 7]
print("\nFlattened list:")
for item in flatten(my_list):
print(item, end=' ') # Output: 1 2 3 4 5 6 7

#### 15. Tree Traversal (Pre-order)
Generators are perfect for traversing complex data structures like trees without building a result list.

class Node:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right

def preorder_traversal(node):
if node:
yield node.value
yield from preorder_traversal(node.left)
yield from preorder_traversal(node.right)

# Usage
# 1
# / \
# 2 3
# /
# 4
tree = Node(1, Node(2, Node(4)), Node(3))
print("\nPre-order traversal:")
for val in preorder_traversal(tree):
print(val, end=' ') # Output: 1 2 4 3

#### 16. Coroutine with .send()
Generators can also receive values using the send() method. This makes them powerful building blocks for coroutines (asynchronous tasks).

def running_average():
total = 0.0
count = 0
average = None
while True:
# Yields the current average and waits for a new value to be sent
new_value = yield average
total += new_value
count += 1
average = total / count

# Usage
averager = running_average()
next(averager) # Prime the coroutine (runs until the first yield)
print(f"Sending 10, Average: {averager.send(10)}") # Output: 10.0
print(f"Sending 20, Average: {averager.send(20)}") # Output: 15.0
print(f"Sending 3, Average: {averager.send(3)}") # Output: 11.0

#### 17. Generating Date Ranges
Like range(), but for dates.

from datetime import date, timedelta

def date_range(start_date, end_date):
current_date = start_date
while current_date <= end_date:
yield current_date
current_date += timedelta(days=1)

# Usage
start = date(2023, 12, 28)
end = date(2024, 1, 2)
print("\nDate Range:")
for d in date_range(start, end):
print(d)

#### 18. File Finder Generator
A generator that walks a directory tree and yields files matching a pattern, like the find command.

import os

def find_files(root_dir, pattern):
for dirpath, _, filenames in os.walk(root_dir):
for filename in filenames:
if pattern in filename:
yield os.path.join(dirpath, filename)

# Usage: Find all Python files in the current directory and subdirectories
# for py_file in find_files('.', '.py'):
# print(py_file)

#### 19. Sentence Tokenizer
Yields sentences one by one from a large block of text, avoiding loading the whole text if it were streamed.

import re

def get_sentences(text):
# A simple regex to split by sentence-ending punctuation
sentences = re.split(r'(?<!\w\.\w.)(?<
![A-Z][a-z]\.)
(?<=\.|\?|\!)\s', text)
for sentence in sentences:
if sentence:
yield sentence.strip()

# Usage
long_text = "This is the first sentence. Here is another one! And what about a third?"
for s in get_sentences(long_text):
print(s)

#### 20. Moving Window / Sliding Window
Yield a "window" of a certain size as it slides over an iterable. Very common in signal processing and data analysis.

from collections import deque

def moving_window(iterable, size):
window = deque(maxlen=size)
for item in iterable:
window.append(item)
if len(window) == size:
yield list(window)

# Usage
data = [1, 2, 3, 4, 5, 6, 7]
print("\nMoving window of size 3:")
for window_slice in moving_window(data, 3):
print(window_slice)
# Output:
# [1, 2, 3]
# [2, 3, 4]
# [3, 4, 5]
# [4, 5, 6]
# [5, 6, 7]

Report Page