I broke production on a Friday afternoon. Deployed a migration that added a new required column. The code that used it deployed 3 minutes later. In those 3 minutes, every API request failed.

I knew better. I just... forgot. Got excited about shipping the feature, skipped my mental checklist, hit deploy.

Pilots don't skip checklists. Surgeons don't skip checklists. After that incident, I stopped trusting my memory and started writing things down.

Before You Stage Files

Before git add, scan your diff for debris:

Also: did you actually run the app and click around? Unit tests pass, great. But did you verify the button is visible and clickable? One manual walkthrough catches things tests miss.

The Logic Check

For every external call—API, database, file system—ask: "What happens if this fails?"

If the answer is "white screen" or "infinite spinner," you're not done. Add error handling. Show the user something useful.

Also consider backward compatibility. If you change a function signature:

// ❌ Breaking change
// Old: getUser(id)
// New: getUser(id, region)  ← Existing calls break

// ✅ Safe change
function getUser(id, region = 'US') {
    // Default preserves existing behavior
}

Before You Merge

How do I rollback?

Every deploy should be reversible. For code, that's usually just reverting a commit. For database changes, it's trickier.

Adding a column? Fine—rollback just ignores it. Renaming a column? Dangerous. The old code still running expects the old name. Safe approach: add new column → migrate data → remove old column. Three separate deploys.

Can I tell if it's broken?

If this breaks at 3am, how will you know? Did you add logs? Not just "Starting..." and "Done"—actual useful logs with IDs and values. When user 12345 reports a bug, can you find their session?

The Feature Flag Safety Net

For risky changes, deploy behind a flag:

if (featureFlags.isEnabled('new-checkout', user)) {
    return renderNewCheckout();
} else {
    return renderOldCheckout();
}

If something goes wrong, you don't redeploy. You flip a switch in your dashboard. A potential outage becomes a minor inconvenience.

My Actual Checklist

Taped to my monitor:

Takes 2 minutes to run through. Saves hours of incident response. The best bugs are the ones you don't ship.

← Back to Debugging & Code Quality

Back to Home