Opening Thoughts
Recently, while mentoring newcomers in coding, I've noticed an interesting phenomenon - many colleagues tend to write functions that modify global variables and mix in various print statements and database operations. Looking at this code, I wonder: why do we tend to write "dirty" functions instead of pure functions? Today, let's discuss pure functions in Python and explore how to write more elegant and reliable code.
Understanding Pure Functions
I remember being confused when I first encountered the concept of pure functions. What does "pure" mean? Why should it be "pure"? Later I gradually realized that pure functions are like mathematical functions - given the same input, they always produce the same output, without any side effects.
Let's look at a simple example:
def add(x, y):
return x + y
result1 = add(3, 5) # always returns 8
result2 = add(3, 5) # always returns 8
This add function is a typical pure function. It's like a mathematical formula - no matter when you call it, given the same input, it will always give the same output. It doesn't depend on any external variables nor modify any external state.
In contrast, this is not a pure function:
total = 0
def add_to_total(x):
global total
total += x
return total
print(add_to_total(3)) # outputs 3
print(add_to_total(3)) # outputs 6
print(add_to_total(3)) # outputs 9
See the difference? This function gives different results each time it's called because it depends on the external variable total. It's like an unpredictable function that gives you different answers for the same input, making it difficult to predict and test.
Practical Applications of Pure Functions
Data Transformation
In real development, we often need to transform data. For example, if we want to apply a 20% discount to all prices in a product list, the traditional approach might be:
def apply_discount(products):
for product in products:
product['price'] = product['price'] * 0.8
items = [
{'name': 'phone', 'price': 1000},
{'name': 'laptop', 'price': 2000}
]
apply_discount(items)
This function directly modifies the original data, making it a typical impure function. If we want to write it as a pure function, we should do it like this:
def apply_discount_pure(products):
return [
{**product, 'price': product['price'] * 0.8}
for product in products
]
items = [
{'name': 'phone', 'price': 1000},
{'name': 'laptop', 'price': 2000}
]
discounted_items = apply_discount_pure(items)
Data Filtering
Here's another example of data filtering. Let's say we want to filter out all products that cost more than 1000:
def filter_expensive_products(products, price_threshold=1000):
return [
product
for product in products
if product['price'] > price_threshold
]
all_products = [
{'name': 'phone', 'price': 800},
{'name': 'laptop', 'price': 2000},
{'name': 'tablet', 'price': 1200}
]
expensive_products = filter_expensive_products(all_products)
Benefits of Pure Functions
Testability
One of the biggest advantages of pure functions is that they're extremely easy to test. Because the input and output are deterministic, we can easily write unit tests:
def test_apply_discount_pure():
# Prepare test data
test_products = [
{'name': 'test_item', 'price': 100}
]
# Execute function
result = apply_discount_pure(test_products)
# Verify results
assert result[0]['price'] == 80
assert test_products[0]['price'] == 100 # Original data unchanged
Parallelization
Pure functions have another important characteristic: since they don't depend on external state, they're perfect for parallel computing. For example, if we need to perform calculations on a large array:
from multiprocessing import Pool
def expensive_calculation(x):
# Assume this is a time-consuming pure function calculation
result = 0
for i in range(1000000):
result += x * i
return result
def parallel_process(numbers):
with Pool(4) as p:
return p.map(expensive_calculation, numbers)
numbers = list(range(1000))
results = parallel_process(numbers)
How to Write Good Pure Functions
Avoid Global State
To write good pure functions, first avoid using global variables. If you need to maintain state, consider using classes:
class Counter:
def __init__(self):
self._count = 0
def increment(self, value):
return self._count + value # Return new value instead of modifying state
def get_count(self):
return self._count
counter = Counter()
new_value = counter.increment(5) # Don't modify state, return new value
Use Immutable Data Structures
In Python, we should try to use immutable data structures, like tuples instead of lists:
def process_coordinates(coords):
# Use tuples to store coordinates
return tuple(x * 2 for x in coords)
original_coords = (1, 2, 3)
new_coords = process_coordinates(original_coords)
Use Functional Programming Tools
Python provides many functional programming tools like map, filter, and reduce. These tools can help us write more concise pure functions:
from functools import reduce
def calculate_total_price(products):
return reduce(
lambda total, product: total + product['price'],
products,
0
)
products = [
{'name': 'phone', 'price': 1000},
{'name': 'laptop', 'price': 2000}
]
total = calculate_total_price(products)
Real-world Case: Order Processing System
Let's look at a more complex example, suppose we need to implement an order processing system:
from dataclasses import dataclass
from typing import List, Dict, Optional
from decimal import Decimal
@dataclass(frozen=True)
class OrderItem:
product_id: str
quantity: int
price: Decimal
@dataclass(frozen=True)
class Order:
items: List[OrderItem]
customer_id: str
def calculate_order_total(order: Order) -> Decimal:
return sum(item.price * item.quantity for item in order.items)
def apply_discount(order: Order, discount_percent: Decimal) -> Order:
new_items = [
OrderItem(
product_id=item.product_id,
quantity=item.quantity,
price=item.price * (1 - discount_percent)
)
for item in order.items
]
return Order(items=new_items, customer_id=order.customer_id)
def validate_order(order: Order) -> Optional[str]:
if not order.items:
return "Order must contain at least one item"
if any(item.quantity <= 0 for item in order.items):
return "Item quantity must be greater than 0"
if any(item.price <= 0 for item in order.items):
return "Item price must be greater than 0"
return None
order = Order(
items=[
OrderItem("prod1", 2, Decimal("100.00")),
OrderItem("prod2", 1, Decimal("200.00"))
],
customer_id="cust1"
)
error = validate_order(order)
if error:
print(f"Order validation failed: {error}")
else:
# Calculate total
total = calculate_order_total(order)
print(f"Order total: {total}")
# Apply discount
discounted_order = apply_discount(order, Decimal("0.1"))
new_total = calculate_order_total(discounted_order)
print(f"Total after discount: {new_total}")
Performance Considerations
You might worry: will pure functions affect performance? After all, they create new data structures instead of modifying existing ones. This is a good question, let's look at a performance test:
import time
from copy import deepcopy
def measure_time(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(f"{func.__name__} took: {end - start:.4f} seconds")
return result
return wrapper
@measure_time
def process_data_impure(data):
for item in data:
item['value'] *= 2
return data
@measure_time
def process_data_pure(data):
return [{'value': item['value'] * 2} for item in data]
test_data = [{'value': i} for i in range(1000000)]
data_copy1 = deepcopy(test_data)
result1 = process_data_impure(data_copy1)
data_copy2 = deepcopy(test_data)
result2 = process_data_pure(data_copy2)
Balance in Real Applications
In real projects, we need to find a balance between pure and impure functions. Some operations are inherently side-effectful, like logging or saving to databases. In these cases, we can use the "onion architecture":
class OrderService:
def __init__(self, db_connection):
self.db = db_connection
def process_order(self, order: Order):
# Core business logic uses pure functions
error = validate_order(order)
if error:
return False, error
total = calculate_order_total(order)
# Side effects are kept at the outer layer
try:
self.save_order_to_db(order, total)
return True, None
except Exception as e:
return False, str(e)
def save_order_to_db(self, order: Order, total: Decimal):
# Database operations go here
pass
Summary and Reflection
Through this article, we've deeply explored the concept, benefits, and practical methods of pure functions in Python. Pure functions not only make code easier to test and maintain but also help us write more reliable programs. However, we also need to find the right balance in real projects.
What do you think? How do you handle pure functions and side effects in your projects? Have you encountered any problems due to impure functions? Feel free to share your experiences and thoughts in the comments.
Remember, there's no silver bullet in programming. Choosing the right approach is more important than dogmatically pursuing pure functions. Next time you write code, think about: does this function need to be pure? If so, how should I refactor it?