Working with Files in Python Safely and Efficiently
A script I wrote worked perfectly on my Mac. Teammate ran it on Windows. Crashed with UnicodeDecodeError. The file had some accented characters—nothing exotic, just names like "José" and "Müller."
I had forgotten to specify encoding. Mac defaults to UTF-8. Windows defaults to something else. Two hours debugging what should have been a one-line fix.
Always Use Context Managers
The classic mistake:
# ❌ What if an error happens before close()?
f = open('data.txt')
data = f.read()
process(data) # If this crashes...
f.close() # ...this never runs
The file handle stays open, potentially causing memory leaks or locking issues. Use with instead:
# ✅ File closes automatically, even if error occurs
with open('data.txt', encoding='utf-8') as f:
data = f.read()
process(data)
# File is closed here, guaranteed
with handles cleanup no matter what—success, exception, early return. No excuses for not using it.
Always Specify Encoding
If you don't specify encoding, Python uses the platform default. That means your code behaves differently on different machines.
# ❌ Platform-dependent behavior
with open('data.txt') as f:
content = f.read()
# ✅ Consistent everywhere
with open('data.txt', encoding='utf-8') as f:
content = f.read()
UTF-8 handles almost everything: English, Chinese, emoji, accented characters. Make it your default.
pathlib Over String Manipulation
Old style:
import os
path = os.path.join(os.getcwd(), 'data', 'users.json')
if not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))
New style:
from pathlib import Path
path = Path.cwd() / 'data' / 'users.json'
path.parent.mkdir(parents=True, exist_ok=True)
pathlib treats paths as objects. The / operator joins them. It handles Windows backslashes vs Unix forward slashes automatically. Much cleaner.
# More pathlib goodness
file = Path('data/report.csv')
file.exists() # True/False
file.is_file() # Is it a file?
file.is_dir() # Is it a directory?
file.suffix # '.csv'
file.stem # 'report'
file.name # 'report.csv'
file.parent # Path('data')
# Read/write shortcuts
content = file.read_text(encoding='utf-8')
file.write_text('new content', encoding='utf-8')
Processing Large Files
Don't load everything into memory:
# ❌ Loads 5GB file into RAM
with open('huge.log') as f:
lines = f.readlines() # Crash
for line in lines:
process(line)
# ✅ One line at a time
with open('huge.log', encoding='utf-8') as f:
for line in f:
process(line.strip())
# Memory usage stays constant
File objects are iterators. Looping directly over them reads line by line, using minimal memory.
Atomic Writes
What happens if your script crashes while writing a config file? You get a half-written, corrupted file. Bad.
The safe pattern: write to a temp file, then rename. Renaming is atomic—it either happens completely or not at all.
import os
def save_safely(filepath, content):
temp_path = filepath + '.tmp'
with open(temp_path, 'w', encoding='utf-8') as f:
f.write(content)
f.flush()
os.fsync(f.fileno()) # Force write to disk
os.replace(temp_path, filepath) # Atomic rename
If the script crashes during writing, filepath still contains the old valid data. The corrupt .tmp file is obvious and can be cleaned up.
JSON Files
JSON is everywhere. Handle it properly:
import json
# Writing
data = {'name': 'José', 'score': 42}
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
# ensure_ascii=False: writes "José" not "Jos\u00e9"
# indent=2: human-readable formatting
# Reading
with open('data.json', encoding='utf-8') as f:
data = json.load(f)
Quick Checklist
- Always use
with open()—no exceptions - Always specify
encoding='utf-8'—for text files - Use
pathlib—cleaner than string manipulation - Iterate over file objects—for large files
- Write to temp, then rename—for critical data
File handling is boring until something goes wrong. Then it's very not boring. Getting these basics right prevents a whole category of production issues.