How to Use Logs Effectively Instead of Guessing in the Dark

User reported a bug at 2am. I pulled up the logs. Here's what I found:

Starting...
Processing...
Done.

That's it. No user ID. No request ID. No timestamps. No indication of what "processing" meant. I spent the next 4 hours adding logs, redeploying, and trying to reproduce the issue—just to get enough information to understand what happened.

In production, logs are your only witness. If they're useless, you're a detective with no evidence.

Why Print Statements Aren't Enough

print("here") works for quick debugging. It fails in production:

No context: "Error: 500" tells you nothing. Which user? Which endpoint?
No levels: A crash and a minor warning look identical
No filtering: You can't search for just errors
No persistence: Stdout vanishes unless captured

Use Log Levels

Proper logging frameworks have levels:

DEBUG: Granular details for developers. Usually off in production.
INFO: Normal operations. "Server started", "Job completed"
WARNING: Something unexpected, but handled. "Disk 80% full", "Retrying connection"
ERROR: Operation failed. "Database refused connection", "Payment declined"
CRITICAL: App can't continue. "Config file missing", "Out of memory"

When investigating a crash, you filter to ERROR and above. You don't wade through a million "User logged in" messages.

Structured Logging

The biggest upgrade: switch from text to JSON.

Text log:

[2025-04-15 10:00:00] ERROR: Payment failed for user 123. Reason: Timeout.

Readable by humans. Terrible for machines. Want to count failures by reason? Write regex.

JSON log:

{
  "timestamp": "2025-04-15T10:00:00Z",
  "level": "ERROR",
  "event": "payment_failed",
  "user_id": 123,
  "reason": "timeout"
}

Feed this into Datadog, Splunk, or ELK. Query instantly: level="ERROR" AND reason="timeout". Build dashboards. Set alerts.

Correlation IDs

In microservices, one user action triggers logs in five services. How do you connect them?

Generate a unique ID when a request enters your system. Pass it to every service. Include it in every log:

{"request_id": "abc-123", "service": "auth", "event": "login_success"}
{"request_id": "abc-123", "service": "orders", "event": "fetching_history"}
{"request_id": "abc-123", "service": "orders", "event": "db_timeout", "level": "ERROR"}

Search for abc-123, see the entire journey across your system.

What to Log

Do log:

Entry and exit points (API requests/responses)
External calls (database, third-party APIs)
Business decisions ("User is premium, applying discount")
Errors with full context

Don't log:

Passwords (even hashed)
API keys
Credit card numbers
PII unless necessary and compliant

Log the Stack Trace

When catching exceptions, don't just log the message:

# ❌ Loses the location
except Exception as e:
    logger.error(f"Failed: {e}")

# ✅ Includes full stack trace
except Exception as e:
    logger.exception("Operation failed")  # Automatically includes trace

The message tells you what happened. The stack trace tells you exactly where.

The Payoff

Good logs turn debugging from hours to minutes. When the 2am alert fires, you search for the request ID, see exactly what happened, and fix it. No guessing. No reproducing. Just evidence.

Treat logs as a feature, not an afterthought.

← Back to Debugging & Code Quality

Back to Home