Why Your AI Coding Assistant Needs Your Incident History
A deeply technical exploration using real payment system failure modes
Modern AI coding assistants are extremely good at syntax, APIs, and common patterns. They are not good at something that actually determines whether code survives production:
Incident history is not "documentation." It is compressed experience—the record of constraints your system only learned after failing.
This post explores, in depth, why AI-generated code without institutional memory is structurally unsafe, and how COEhub MCP Server changes code generation by injecting incident-derived constraints directly into the prompt → generation loop.
We'll use payment processing as a concrete example because failures are expensive, correctness is subtle, and most teams already have scars here.
TL;DR (for impatient engineers)
- Generic AI writes clean code
- Incident-aware AI writes survivable code
- Survivability comes from invariants learned during outages: idempotency, state machines, reconciliation, concurrency control, observability
- COEhub turns postmortems into first-class design constraints for AI
The Prompt
"Write a function to process payments through our Stripe integration."
Same prompt. Two radically different outcomes.
World One: AI Without Incident History
This AI knows Stripe's API. It knows common tutorials. It does not know how your system fails.
Typical output
Why this code looks fine
- Compiles
- Passes unit tests
- Matches Stripe examples
- Easy to read
Why this code will hurt you
It violates production invariants you only learn after incidents:
- No idempotency (retry = double charge)
- Floating-point money
- DB write happens after remote call
- No reconciliation path
- No concurrency protection
- Assumes webhooks are ordered and delivered exactly once
- Observability is an afterthought
Incident History Changes Everything
Let's assume COEhub contains incidents like:
- INC-2024-0892: Duplicate charges caused by retry storms
- INC-2023-1104: Floating-point rounding created money drift
- INC-2024-0567: Stripe succeeded, DB write timed out
- INC-2024-0719: Refund raced with capture
- INC-2024-0831: Duplicate webhook delivery triggered side effects twice
These incidents encode design constraints. AI with COEhub MCP can retrieve them before writing code.
World Two: AI With Incident History
The first thing a reliable assistant does is define invariants.
Step 1: Explicit invariants (this is non-negotiable)
- A user action must never cause multiple charges
- Money math must be exact
- DB state must be safe under partial failure
- All external calls must be idempotent
- State transitions must be serialized
- Observability must make future incidents cheaper
Step 2: Idempotency (application-level, not just Stripe)
Stripe idempotency keys help with request retries. They do not protect you from your own system executing twice.
Rule: Every business operation must have a deterministic operation_id.
Step 3: Money correctness (Decimal / BigDecimal only)
Step 4: Pending state before Stripe call
This prevents the "Stripe succeeded but DB write timed out" failure.
Step 5: Stripe call with deterministic idempotency
Step 6: State transitions as a state machine
Allowed transitions:
Anything else is a bug.
Step 7: Webhooks are at-least-once and unordered
Rule: Webhook handlers must be idempotent and side-effect safe.
Step 8: Concurrency control for refunds and captures
Step 9: Reconciliation is mandatory
A reconciliation job exists because incidents happened.
Observability (learn faster next time)
Metrics that matter:
payments.create.successpayments.create.timeoutpayments.pending.agewebhook.duplicatesrefund.race.detected
Why COEhub MCP Changes AI Behavior
Without COEhub:
- AI optimizes for correctness in isolation
- No memory of past failures
- Repeats known mistakes
With COEhub:
- AI optimizes for organizational survivability
- Incident history becomes prompt context
- Postmortems become constraints
- Code reflects lived experience
Mental Model
Generic AI = Intern with perfect syntax
COEhub-aware AI = Staff engineer who remembers outages
Request Flow
Failure Recovery Loop
Final Takeaway
AI does not fail because it lacks intelligence. It fails because it lacks memory of pain.
COEhub turns incidents into:
- constraints
- invariants
- guardrails
- institutional memory
So your AI stops writing plausible code and starts writing production-survivable systems.