I optimized a data processing script last month. Changed maybe 10 lines of code. Runtime went from 4 hours to 5 minutes.

The original code wasn't wrong—it produced correct output. It just used a list where it should have used a set. That single change was a 48x speedup.

Python gets blamed for being slow, but in my experience, slow Python is usually slow algorithms hiding behind readable syntax. Here are the mistakes I see most often.

String Concatenation in Loops

Strings in Python are immutable. Every time you do += on a string, Python creates a brand new string and copies the old content plus the new part. In a loop, this means copying more and more data with each iteration.

# ❌ Gets slower and slower as the string grows
def build_report(records):
    report = ""
    for record in records:
        report += f"ID: {record['id']}, Value: {record['value']}\n"
    return report

# ✅ Stays fast regardless of size
def build_report(records):
    parts = []
    for record in records:
        parts.append(f"ID: {record['id']}, Value: {record['value']}")
    return "\n".join(parts)

The second version appends to a list (fast), then joins once at the end. For a few dozen strings, you won't notice. For thousands, it's night and day.

Lists for Membership Testing

This is the big one. Checking if x in some_list scans the entire list. Do that inside a loop, and you've got O(n²) complexity without realizing it.

# ❌ Scanning the list every single time
def filter_valid_users(all_users, valid_ids):
    result = []
    for user in all_users:
        if user.id in valid_ids:  # O(n) scan each time
            result.append(user)
    return result

# ✅ Set lookup is O(1)
def filter_valid_users(all_users, valid_ids):
    valid_set = set(valid_ids)  # One-time conversion
    result = []
    for user in all_users:
        if user.id in valid_set:  # Instant lookup
            result.append(user)
    return result

That 4-hour-to-5-minute optimization I mentioned? This was it. 100,000 users, 50,000 valid IDs. The list version does 5 billion comparisons. The set version does 100,000 hash lookups.

Rule of thumb: if you're checking if x in collection more than a few times, make that collection a set.

Attribute Lookups in Tight Loops

Every dot in Python is a dictionary lookup. result.append means "find the 'append' attribute on this object." In a loop that runs millions of times, these lookups add up.

# Slightly slower
def process(data):
    result = []
    for item in data:
        result.append(item.upper())
    return result

# Slightly faster
def process(data):
    result = []
    append = result.append  # Cache the method
    for item in data:
        append(item.upper())
    return result

Honestly, this is micro-optimization territory. Don't bother unless you've profiled and this specific loop is a bottleneck. Readability matters more for 99% of code.

Not Using Built-ins

Python's built-in functions are implemented in C. They're way faster than equivalent Python loops.

# ❌ Pure Python loop
total = 0
for x in range(1000000):
    total += x

# ✅ C implementation
total = sum(range(1000000))

Same goes for max(), min(), any(), all(). If there's a built-in for what you're doing, use it.

List comprehensions are also faster than manual loops with .append() because the list building happens in C.

Global Variables

Accessing a global is slower than accessing a local variable. Python checks local scope first (fast array lookup), then global scope (slower dictionary lookup).

import math

# ❌ Repeated global lookups
data = range(1000000)

def calculate():
    result = []
    for i in data:  # Global 'data'
        result.append(math.sqrt(i))  # Global 'math'
    return result

# ✅ Local variable access
def calculate_fast():
    local_data = range(1000000)
    sqrt = math.sqrt  # Local reference
    result = []
    for i in local_data:
        result.append(sqrt(i))
    return result

Fun fact: just wrapping your script in def main(): ... and calling it gives a small speedup because everything becomes local scope instead of global.

When to Optimize

Don't optimize blindly. Profile first. Python's cProfile module shows you exactly where time is spent:

python -m cProfile -s cumtime your_script.py

Focus on the hot spots. A 10% improvement in a function that takes 0.1% of total time is worthless. A 10% improvement in the function that takes 80% of time actually matters.

The Checklist

Most Python performance problems aren't about Python being slow. They're about accidentally writing O(n²) algorithms that look like O(n) because the syntax is so clean.

← Back to Python Articles

Back to Home