Pure Functions and Functional Programming in Python: A Journey from Basics to Advanced Thinking-Sharing Life Wisdom

Opening Thoughts

Recently, while mentoring newcomers in coding, I've noticed an interesting phenomenon - many colleagues tend to write functions that modify global variables and mix in various print statements and database operations. Looking at this code, I wonder: why do we tend to write "dirty" functions instead of pure functions? Today, let's discuss pure functions in Python and explore how to write more elegant and reliable code.

Understanding Pure Functions

I remember being confused when I first encountered the concept of pure functions. What does "pure" mean? Why should it be "pure"? Later I gradually realized that pure functions are like mathematical functions - given the same input, they always produce the same output, without any side effects.

Let's look at a simple example:

def add(x, y):
    return x + y


result1 = add(3, 5)  # always returns 8
result2 = add(3, 5)  # always returns 8

This add function is a typical pure function. It's like a mathematical formula - no matter when you call it, given the same input, it will always give the same output. It doesn't depend on any external variables nor modify any external state.

In contrast, this is not a pure function:

total = 0

def add_to_total(x):
    global total
    total += x
    return total


print(add_to_total(3))  # outputs 3
print(add_to_total(3))  # outputs 6
print(add_to_total(3))  # outputs 9

See the difference? This function gives different results each time it's called because it depends on the external variable total. It's like an unpredictable function that gives you different answers for the same input, making it difficult to predict and test.

Practical Applications of Pure Functions

Data Transformation

In real development, we often need to transform data. For example, if we want to apply a 20% discount to all prices in a product list, the traditional approach might be:

def apply_discount(products):
    for product in products:
        product['price'] = product['price'] * 0.8


items = [
    {'name': 'phone', 'price': 1000},
    {'name': 'laptop', 'price': 2000}
]
apply_discount(items)

This function directly modifies the original data, making it a typical impure function. If we want to write it as a pure function, we should do it like this:

def apply_discount_pure(products):
    return [
        {**product, 'price': product['price'] * 0.8}
        for product in products
    ]


items = [
    {'name': 'phone', 'price': 1000},
    {'name': 'laptop', 'price': 2000}
]
discounted_items = apply_discount_pure(items)

Data Filtering

Here's another example of data filtering. Let's say we want to filter out all products that cost more than 1000:

def filter_expensive_products(products, price_threshold=1000):
    return [
        product
        for product in products
        if product['price'] > price_threshold
    ]


all_products = [
    {'name': 'phone', 'price': 800},
    {'name': 'laptop', 'price': 2000},
    {'name': 'tablet', 'price': 1200}
]
expensive_products = filter_expensive_products(all_products)

Benefits of Pure Functions

Testability

One of the biggest advantages of pure functions is that they're extremely easy to test. Because the input and output are deterministic, we can easily write unit tests:

def test_apply_discount_pure():
    # Prepare test data
    test_products = [
        {'name': 'test_item', 'price': 100}
    ]

    # Execute function
    result = apply_discount_pure(test_products)

    # Verify results
    assert result[0]['price'] == 80
    assert test_products[0]['price'] == 100  # Original data unchanged

Parallelization

Pure functions have another important characteristic: since they don't depend on external state, they're perfect for parallel computing. For example, if we need to perform calculations on a large array:

from multiprocessing import Pool

def expensive_calculation(x):
    # Assume this is a time-consuming pure function calculation
    result = 0
    for i in range(1000000):
        result += x * i
    return result

def parallel_process(numbers):
    with Pool(4) as p:
        return p.map(expensive_calculation, numbers)


numbers = list(range(1000))
results = parallel_process(numbers)

How to Write Good Pure Functions

Avoid Global State

To write good pure functions, first avoid using global variables. If you need to maintain state, consider using classes:

class Counter:
    def __init__(self):
        self._count = 0

    def increment(self, value):
        return self._count + value  # Return new value instead of modifying state

    def get_count(self):
        return self._count


counter = Counter()
new_value = counter.increment(5)  # Don't modify state, return new value

Use Immutable Data Structures

In Python, we should try to use immutable data structures, like tuples instead of lists:

def process_coordinates(coords):
    # Use tuples to store coordinates
    return tuple(x * 2 for x in coords)


original_coords = (1, 2, 3)
new_coords = process_coordinates(original_coords)

Use Functional Programming Tools

Python provides many functional programming tools like map, filter, and reduce. These tools can help us write more concise pure functions:

from functools import reduce

def calculate_total_price(products):
    return reduce(
        lambda total, product: total + product['price'],
        products,
        0
    )


products = [
    {'name': 'phone', 'price': 1000},
    {'name': 'laptop', 'price': 2000}
]
total = calculate_total_price(products)

Real-world Case: Order Processing System

Let's look at a more complex example, suppose we need to implement an order processing system:

from dataclasses import dataclass
from typing import List, Dict, Optional
from decimal import Decimal

@dataclass(frozen=True)
class OrderItem:
    product_id: str
    quantity: int
    price: Decimal

@dataclass(frozen=True)
class Order:
    items: List[OrderItem]
    customer_id: str

def calculate_order_total(order: Order) -> Decimal:
    return sum(item.price * item.quantity for item in order.items)

def apply_discount(order: Order, discount_percent: Decimal) -> Order:
    new_items = [
        OrderItem(
            product_id=item.product_id,
            quantity=item.quantity,
            price=item.price * (1 - discount_percent)
        )
        for item in order.items
    ]
    return Order(items=new_items, customer_id=order.customer_id)

def validate_order(order: Order) -> Optional[str]:
    if not order.items:
        return "Order must contain at least one item"
    if any(item.quantity <= 0 for item in order.items):
        return "Item quantity must be greater than 0"
    if any(item.price <= 0 for item in order.items):
        return "Item price must be greater than 0"
    return None


order = Order(
    items=[
        OrderItem("prod1", 2, Decimal("100.00")),
        OrderItem("prod2", 1, Decimal("200.00"))
    ],
    customer_id="cust1"
)


error = validate_order(order)
if error:
    print(f"Order validation failed: {error}")
else:
    # Calculate total
    total = calculate_order_total(order)
    print(f"Order total: {total}")

    # Apply discount
    discounted_order = apply_discount(order, Decimal("0.1"))
    new_total = calculate_order_total(discounted_order)
    print(f"Total after discount: {new_total}")

Performance Considerations

You might worry: will pure functions affect performance? After all, they create new data structures instead of modifying existing ones. This is a good question, let's look at a performance test:

import time
from copy import deepcopy

def measure_time(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"{func.__name__} took: {end - start:.4f} seconds")
        return result
    return wrapper

@measure_time
def process_data_impure(data):
    for item in data:
        item['value'] *= 2
    return data

@measure_time
def process_data_pure(data):
    return [{'value': item['value'] * 2} for item in data]


test_data = [{'value': i} for i in range(1000000)]


data_copy1 = deepcopy(test_data)
result1 = process_data_impure(data_copy1)


data_copy2 = deepcopy(test_data)
result2 = process_data_pure(data_copy2)

Balance in Real Applications

In real projects, we need to find a balance between pure and impure functions. Some operations are inherently side-effectful, like logging or saving to databases. In these cases, we can use the "onion architecture":

class OrderService:
    def __init__(self, db_connection):
        self.db = db_connection

    def process_order(self, order: Order):
        # Core business logic uses pure functions
        error = validate_order(order)
        if error:
            return False, error

        total = calculate_order_total(order)

        # Side effects are kept at the outer layer
        try:
            self.save_order_to_db(order, total)
            return True, None
        except Exception as e:
            return False, str(e)

    def save_order_to_db(self, order: Order, total: Decimal):
        # Database operations go here
        pass

Summary and Reflection

Through this article, we've deeply explored the concept, benefits, and practical methods of pure functions in Python. Pure functions not only make code easier to test and maintain but also help us write more reliable programs. However, we also need to find the right balance in real projects.

What do you think? How do you handle pure functions and side effects in your projects? Have you encountered any problems due to impure functions? Feel free to share your experiences and thoughts in the comments.

Remember, there's no silver bullet in programming. Choosing the right approach is more important than dogmatically pursuing pure functions. Next time you write code, think about: does this function need to be pure? If so, how should I refactor it?

Previous Next