Python generators are a crucial feature with multiple uses, from lazy iteration to continuous data streaming. They are heavily adopted in popular packages such as FastAPI, SQLAlchemy, pytest, and others, highlighting their power and the importance of understanding how they work.
This article provides answers to the following questions about generators:
Each section covers both theoretical and practical aspects of Python generators. By the end, you will have a strong comprehension of generators and their importance. The first section begins by defining what a generator is and how it operates.
At first glance, the generator function appears to be no different from a regular function. Consider the following example:
from typing import Generator
def calculate(a: int, b: int) -> Generator[int | float, None, None]:
print(f"Starting calculations for {a} and {b}")
print("Sum")
yield a + b
print("Subtraction")
yield a - b
print("Multiplication")
yield a * b
print("Division")
yield a / b
print("Done calculating")
The defining characteristic of a generator function is the use of the yield
keyword. When called, the generator function produces a generator, meaning that it can be seen as a generator factory.
To work with a generator, there are two methods. Consider the first example of how to use the calculate
function:
calculations = calculate(10, 5)
for result in calculations:
print("Result:", result)
To start, the calculate
generator function is used to create a generator object. It doesn't do anything else: without the for
loop, no print
outputs would appear. The code then goes through the generator output by iterating over it. This causes the generator to stop at each yield and pass it as a result in the loop. Overall, running this code produces the following result:
Starting calculations for 10 and 5
Sum
Result: 15
Subtraction
Result: 5
Multiplication
Result: 50
Division
Result: 2.0
Done calculating
If you want to work with a generator in a more direct way, you can use the next
function. Here's an example of how to use the calculate
generator in this manner:
calculations = calculate(10, 5)
print(next(calculations))
print(next(calculations))
print(next(calculations))
print(next(calculations))
print(next(calculations))
Although the outcome is identical, the code now throws a StopIteration
exception. This occurs because the generator gets exhausted after the final yield
statement. As a result, it cannot be used again, and a new generator object is required to start over. When using a for
loop to iterate, it automatically stops when the generator is exhausted, which is why there was no exception in the first example.
When using a generator, it's crucial to pay attention to when it stops and continues the process. If you only have one next
call, you'll only see the print
statement for the sum, indicating that the generator stopped at the first yield
. This means that any remaining computations and print
statements will not be executed. This feature can be very useful for adding some "laziness" to your code.
It is important to mention that the final print statement with the "Done calculating" message is called after the fifth next
call. This is the same call that exhausts the generator, but it's also the only way to reach the code after the last "yield" statement. To sum up, when using a for
loop or the next
function, the generator jumps over subsequent yield
statements and the code between them until there are no more yields. At that point, the final code is executed, and the generator is exhausted.
Although generators are not an easy topic, this introduction should provide you with a good understanding of how they operate. With this knowledge covered, you can now move on to further interactions with the generator object.
Generators use the yield
keyword to temporarily pause execution and output a value, which is their core feature. However, generators also have the functionalities of sending and returning.
In the first example, the type annotation Generator[int | float, None, None]
indicates that the generator yields values of type integer or float, and the remaining two are types for sending or returning.
The send
method of a generator object allows you to input a value into a generator during execution, which is then processed by the yield
statement where the generator paused. Here is an example to illustrate this functionality:
import random
from typing import Generator
def send_message(message: str) -> str:
print(f"Sending: {message}")
return str(random.randint(1, 1000))
def chat(message: str) -> Generator[str, str, None]:
print("Starting a new chat")
history = []
while True:
history.append(message)
response = send_message(message)
history.append(response)
message = yield response
quick_chat = chat("hello")
print(next(quick_chat))
print(quick_chat.send("how are you doing?"))
print(quick_chat.send("oh, that is nice!"))
The chat
generator function acts like a conversation. It starts with an initial message and uses the send_message
function to send and receive messages while keeping a record of the conversation history. The generator object won't get exhausted because of the while True
loop. When executed, the code produces the following output:
Starting a new chat
Sending: hello
786
Sending: how are you doing?
392
Sending: oh, that is nice!
416
To send values into a generator, you need to assign the result of yield
to a variable (e.g. message = yield response
). However, before sending anything, you must first ensure that the generator object proceeds to the first yield
statement. Otherwise, there will be nothing to handle the sending. In the example above, this is achieved by calling the next
function, which also sends the "hello" message. Once the generator is stopped at the yield
statement, you can use the send
method. In the example, the send
method is used twice to reassign the message
variable, which is then sent using the send_message
function.
Adding a send
feature can enhance the interaction of a generator. By including a return
statement within the generator, you can further expand its functionalities. Check out the tweaked chat
generator for an example:
... # trunkated code
def chat(message: str) -> Generator[str, str, list[str]]:
print("Starting a new chat")
history = []
while True:
history.append(message)
response = send_message(message)
history.append(response)
message = yield response
if not message:
return history
quick_chat = chat("hello")
print(next(quick_chat))
print(quick_chat.send("how are you doing?"))
print(quick_chat.send("oh, that is nice!"))
try:
quick_chat.send("")
except StopIteration as e:
print(e.value)
A new type annotation, list[str]
has been added as the return type of the generator. If an empty value is passed via the send
method, the generator will now return conversation history. The updated output is as follows:
Starting a new chat
Sending: hello
459
Sending: how are you doing?
817
Sending: oh, that is nice!
84
['hello', '459', 'how are you doing?', '817', 'oh, that is nice!', '84']
The generator got exhausted due to the return statement. However, you can retrieve the returned value from the exception by using the try/except syntax. This method may seem unusual, as it is not the typical way of using a generator. Generally, some iteration is used to go over the generator, which handles the exhaustion behind the scene. The above examples make use of the next and send methods just to showcase the underlying processes.
This knowledge should help you understand how to interact with generators better. Additionally, there are two more techniques to build and exhaust a generator to follow.
Generator expressions provide a concise way of creating generators. They enable you to define generators in just one line of code, which is a sophisticated method for generating sequences of data. The syntax for creating a generator expression is like that of a list comprehension. For example, suppose you have a decorator that takes a list of numbers (which might be a large list), squares them, and yields only even numbers. In that case, you can use a generator function as follows:
from typing import Generator
large_numbers_dataset = list(range(1, 11))
print(large_numbers_dataset)
def square_even(numbers: list[int]) -> Generator[int, None, None]:
for n in numbers:
if n % 2 == 0:
yield n**2
squared = square_even(large_numbers_dataset)
print(squared)
print(list(squared))
And the output is the following:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
<generator object square_even at 0x7fc5789f4110>
[4, 16, 36, 64, 100]
The results are as expected. In this scenario, using a generator allows for the computation of a portion of the dataset. For instance, you can iterate over the squared
generator and stop at the second outcome. In such case, you would receive only two results without processing the entire dataset. Hence, while a one-line list comprehension is feasible, it is not as efficient. Nonetheless, a one-line generator expression can be utilized. To do so, replace the square_even
function with the implementation below:
large_numbers_dataset = list(range(1, 11))
squared_exp = (n**2 for n in large_numbers_dataset if n % 2 == 0)
print(squared_exp)
print(list(squared_exp))
The outcome is now nearly the same:
<generator object <genexpr> at 0x7fc5789f5cb0>
[4, 16, 36, 64, 100]
The generator expression ultimately creates a generator object, similar to the generator function. However, it is more concise and expressive. It can be used like a list comprehension, but still maintains the laziness functionality of a generator. Moreover, generator expressions are suitable to be used as a function argument. For instance, consider the following example:
large_numbers_dataset = list(range(1, 11))
print(sum(n**2 for n in large_numbers_dataset if n % 2 == 0))
The result is 220, which means that the generator yielded all the numbers as previously and they were all passed to the sum
function. This is a clean and memory-efficient way of processing data without creating intermediate lists.
In Python, you can create hierarchical generators by using subgenerators. These are generators that are nested within other generators, making it possible to structure and organize your code in a modular and reusable manner. Essentially, subgenerators serve as an alternative method for "unpacking" a generator within another generator. Take a look at the code example below for reference:
numbers_board = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
for row in numbers_board:
for cell in row:
print(cell)
Naturally, the code displays numbers 1 to 9. Nonetheless, using a double for
loop in your code is not the most optimal and elegant solution. A more efficient approach would be to use a generator:
def cells(board: list[list[int]]) -> Generator[int, None, None]:
for row in board:
for cell in row:
yield cell
for cell in cells(numbers_board):
print(cell)
Although the outcome remains unchanged, your code outside of the function is now more concise and takes benefits from using a generator. Additionally, there is another enhancement you can make in this scenario: separate the two loops and delegate for cell in row
to a distinct generator. This can be accomplished by implementing a subgenerator and the yield from
syntax:
def columns(row: list[int]) -> Generator[int, None, None]:
for column in row:
yield column
def cells(board: list[list[int]]) -> Generator[int, None, None]:
for row in board:
yield from columns(row)
for cell in cells(numbers_board):
print(cell)
The program will print the numbers 1 through 9 again, but this time, the cells
generator will use the columns
subgenerator to generate the cells. The yield from
statement allows the outer generator to transfer control to the inner generator and obtain values from it without having to iterate over each one. Although this may seem like an unnecessary addition in this simple example, it is included to demonstrate the use of a subgenerator.
With the information covered in the previous two chapters, you should now have a good understanding of the generators topic. The next section will explain why it is crucial to have a correct comprehension of how generators work.
Python's generators are definitely a unique aspect of the language that is, however, not used that frequently in the code. This is likely due to the fact that the default iteration and list comprehension syntax are sufficient in most cases. However, incorporating lazy iteration and other generator-related enhancements can sometimes be a game-changer in the codebase.
Despite their limited use in code, generators are essential when working with external libraries such as the FastAPI web framework or pytest testing library. To go through some examples, first, take a look at a class that mimics a database and is aware of its connected
status:
class Database:
def __init__(self) -> None:
self._entries: list[dict] = []
self._connected = False
def connect(self) -> None:
self._connected = True
def disconnect(self) -> None:
self._connected = False
def add(self, entry: dict) -> None:
if self._connected is False:
raise exceptions.DatabaseError("Database is disconnected")
self._entries.append(entry)
@property
def entries(self) -> list[dict]:
if self._connected is False:
raise exceptions.DatabaseError("Database is disconnected")
return self._entries
If you want to implement this class in a FastAPI example, it would appear as follows:
from fastapi import FastAPI, APIRouter
from sigma_py.generators.utils import Database
app = FastAPI()
router = APIRouter()
@router.get("/entries")
async def get_entries() -> list[dict]:
db = Database()
db.connect()
entries = db.entries
db.disconnect()
return entries
The issue in this case, is that you need to manually:
FastAPI solves this problem by introducing a dependency system, that uses generators. Take a look at the enhanced version of the get_entries
endpoint:
from typing import Generator
from fastapi import FastAPI, APIRouter, Depends
from sigma_py.generators.utils import Database
app = FastAPI()
router = APIRouter()
def get_db() -> Generator[Database, None, None]:
db = Database()
db.connect()
yield db
db.disconnect()
@router.get("/entries")
async def get_entries(db: Database = Depends(get_db)) -> list[dict]:
return db.entries
The get_db
dependency is responsible for all database-related tasks, allowing the endpoint to focus on its intended logic. This dependency is a generator function that creates and connects to a database object, and disconnects it once the generator is exhausted. The framework handles the generator's mechanics behind the scenes.
When using pytest to write tests, a similar process occurs. The library also uses generators to create fixtures that resemble FastAPI dependencies:
import pytest
from typing import Generator
from sigma_py.generators.utils import Database
@pytest.fixture
def db() -> Generator[Database, None, None]:
db = Database()
db.connect()
yield db
db._entries = []
db.disconnect()
def test_add_entry(db: Database) -> None:
assert db.entries == []
db.add({"name": "Jon"})
assert db.entries == [{"name": "Jon"}]
As you can see, pytest also leverages a decorator to create and initialize dependencies and add a tear-down code which is executed before the generator gets exhausted.
Generators are a powerful feature in Python that enable efficient and flexible handling of data generation, iteration, and processing. They offer lazy evaluation, saving memory and enabling infinite sequences. Generators provide advantages such as improved memory utilization, enhanced code readability, and the ability to handle large or infinite data streams. Popular libraries like FastAPI, pytest, Django, TensorFlow, or asyncio utilize generators for dependency management, fixtures, pagination, batch processing, and asynchronous iteration respectively. By understanding and leveraging generators, you can optimize code, improve performance, and design modular and reusable solutions. Embrace generators to write efficient, readable, and scalable code in various domains.
Code examples used in the article can be found here: link.
The biggest inspiration for these articles and source of my knowledge is the Fluent Python book by Luciano Ramalho. I highly encourage you to check it out; you will not be disappointed.
Documentation page of each library:
Also published here.