User reported a bug at 2am. I pulled up the logs. Here's what I found:

Starting...
Processing...
Done.

That's it. No user ID. No request ID. No timestamps. No indication of what "processing" meant. I spent the next 4 hours adding logs, redeploying, and trying to reproduce the issue—just to get enough information to understand what happened.

In production, logs are your only witness. If they're useless, you're a detective with no evidence.

Why Print Statements Aren't Enough

print("here") works for quick debugging. It fails in production:

Use Log Levels

Proper logging frameworks have levels:

When investigating a crash, you filter to ERROR and above. You don't wade through a million "User logged in" messages.

Structured Logging

The biggest upgrade: switch from text to JSON.

Text log:

[2025-04-15 10:00:00] ERROR: Payment failed for user 123. Reason: Timeout.

Readable by humans. Terrible for machines. Want to count failures by reason? Write regex.

JSON log:

{
  "timestamp": "2025-04-15T10:00:00Z",
  "level": "ERROR",
  "event": "payment_failed",
  "user_id": 123,
  "reason": "timeout"
}

Feed this into Datadog, Splunk, or ELK. Query instantly: level="ERROR" AND reason="timeout". Build dashboards. Set alerts.

Correlation IDs

In microservices, one user action triggers logs in five services. How do you connect them?

Generate a unique ID when a request enters your system. Pass it to every service. Include it in every log:

{"request_id": "abc-123", "service": "auth", "event": "login_success"}
{"request_id": "abc-123", "service": "orders", "event": "fetching_history"}
{"request_id": "abc-123", "service": "orders", "event": "db_timeout", "level": "ERROR"}

Search for abc-123, see the entire journey across your system.

What to Log

Do log:

Don't log:

Log the Stack Trace

When catching exceptions, don't just log the message:

# ❌ Loses the location
except Exception as e:
    logger.error(f"Failed: {e}")

# ✅ Includes full stack trace
except Exception as e:
    logger.exception("Operation failed")  # Automatically includes trace

The message tells you what happened. The stack trace tells you exactly where.

The Payoff

Good logs turn debugging from hours to minutes. When the 2am alert fires, you search for the request ID, see exactly what happened, and fix it. No guessing. No reproducing. Just evidence.

Treat logs as a feature, not an afterthought.

← Back to Debugging & Code Quality

Back to Home