Working with Files in Python Safely and Efficiently

A script I wrote worked perfectly on my Mac. Teammate ran it on Windows. Crashed with UnicodeDecodeError. The file had some accented characters—nothing exotic, just names like "José" and "Müller."

I had forgotten to specify encoding. Mac defaults to UTF-8. Windows defaults to something else. Two hours debugging what should have been a one-line fix.

Always Use Context Managers

The classic mistake:

# ❌ What if an error happens before close()?
f = open('data.txt')
data = f.read()
process(data)  # If this crashes...
f.close()      # ...this never runs

The file handle stays open, potentially causing memory leaks or locking issues. Use with instead:

# ✅ File closes automatically, even if error occurs
with open('data.txt', encoding='utf-8') as f:
    data = f.read()
    process(data)
# File is closed here, guaranteed

with handles cleanup no matter what—success, exception, early return. No excuses for not using it.

Always Specify Encoding

If you don't specify encoding, Python uses the platform default. That means your code behaves differently on different machines.

# ❌ Platform-dependent behavior
with open('data.txt') as f:
    content = f.read()

# ✅ Consistent everywhere
with open('data.txt', encoding='utf-8') as f:
    content = f.read()

UTF-8 handles almost everything: English, Chinese, emoji, accented characters. Make it your default.

pathlib Over String Manipulation

Old style:

import os
path = os.path.join(os.getcwd(), 'data', 'users.json')
if not os.path.exists(os.path.dirname(path)):
    os.makedirs(os.path.dirname(path))

New style:

from pathlib import Path

path = Path.cwd() / 'data' / 'users.json'
path.parent.mkdir(parents=True, exist_ok=True)

pathlib treats paths as objects. The / operator joins them. It handles Windows backslashes vs Unix forward slashes automatically. Much cleaner.

# More pathlib goodness
file = Path('data/report.csv')

file.exists()       # True/False
file.is_file()      # Is it a file?
file.is_dir()       # Is it a directory?
file.suffix         # '.csv'
file.stem           # 'report'
file.name           # 'report.csv'
file.parent         # Path('data')

# Read/write shortcuts
content = file.read_text(encoding='utf-8')
file.write_text('new content', encoding='utf-8')

Processing Large Files

Don't load everything into memory:

# ❌ Loads 5GB file into RAM
with open('huge.log') as f:
    lines = f.readlines()  # Crash
    for line in lines:
        process(line)

# ✅ One line at a time
with open('huge.log', encoding='utf-8') as f:
    for line in f:
        process(line.strip())
# Memory usage stays constant

File objects are iterators. Looping directly over them reads line by line, using minimal memory.

Atomic Writes

What happens if your script crashes while writing a config file? You get a half-written, corrupted file. Bad.

The safe pattern: write to a temp file, then rename. Renaming is atomic—it either happens completely or not at all.

import os

def save_safely(filepath, content):
    temp_path = filepath + '.tmp'
    
    with open(temp_path, 'w', encoding='utf-8') as f:
        f.write(content)
        f.flush()
        os.fsync(f.fileno())  # Force write to disk
    
    os.replace(temp_path, filepath)  # Atomic rename

If the script crashes during writing, filepath still contains the old valid data. The corrupt .tmp file is obvious and can be cleaned up.

JSON Files

JSON is everywhere. Handle it properly:

import json

# Writing
data = {'name': 'José', 'score': 42}

with open('data.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

# ensure_ascii=False: writes "José" not "Jos\u00e9"
# indent=2: human-readable formatting

# Reading
with open('data.json', encoding='utf-8') as f:
    data = json.load(f)

Quick Checklist

Always use with open()—no exceptions
Always specify encoding='utf-8'—for text files
Use pathlib—cleaner than string manipulation
Iterate over file objects—for large files
Write to temp, then rename—for critical data

File handling is boring until something goes wrong. Then it's very not boring. Getting these basics right prevents a whole category of production issues.

← Back to Python Articles

Back to Home