Importance of Idempotent API in distributed systems

Summary

Cross-service calls fail. Your only safe option to preserve consistency is often to retry. The question is not whether you can retry, but whether it is safe to replay the same intent later — in a 1 sec, in 2 minutes, in 10 hours from now, from a dead-letter queue (DLQ), or 30 days later after an outage?

"Is it OKAY to retry?"

For systems that care about correctness, like banking, you design for idempotency and clear retry policies from the start. 

You rarely achieve reliable, deterministic distributed behavior if your APIs are not idempotent.

Introduction

In distributed systems — particularly in the banking and financial sector, where systems must guarantee accuracy, auditability, and compliance — idempotency is not a luxury; it is a fundamental design requirement.

Idempotency ensures that executing the same request multiple times yields the same result, without unintended side effects such as duplicate transactions, double payments, or inconsistent state.

In this article, we’ll explore why idempotency is critical, how banks design for it, what happens when idempotency keys expire, and how controlled reprocessing and compensation mechanisms ensure system integrity even in complex message-driven architectures.

Why Idempotency Is So Critical

Idempotency protects distributed systems from inevitable failures such as:

  • Network retries or timeouts

  • Client restarts after uncertain responses

  • At-least-once message delivery semantics in messaging systems (Kafka, RabbitMQ)

  • Manual reprocessing or duplicate submissions by operators

Without idempotency, repeated invocations of the same API or message could cause financial inconsistencies, like charging a customer twice or recording multiple ledger entries for one transaction.

In banking, each transaction must be atomic and non-repeatable — no matter how many times a request or message is delivered. 

Option#1 - Natural Idempotency in REST APIs



Natural idempotency refers to idempotent behavior that arises inherently from the API’s HTTP method semantics or resource model, without requiring an explicit Idempotency-Key or any special deduplication logic.

In other words — the REST API’s very design ensures that repeating the same request has no additional effect. Apart, RPC APIs don't bringNaturalIdempotencyout of the box.

GET methods

Bydesignmust not change state, and must not side effect so they are safe and idempotent by definition.

PUT methods

HTTP semantics define PUT as idempotent by nature.

Example:

      PUT /api/customers/123 { "email": "alice@example.com" } 
    

If the client sends this same request 10 times:

  • The resource /customers/123 still ends up with email = alice@example.com

  • No duplicates, no extra side effects

Result is deterministic and stable — this is natural idempotency.

DELETE methods

Whether you send this once or 10 times - the resource is deleted once. That’s alsonaturally idempotent.
DELETE /api/customers/123

First call should return"204 No Content". Repeating it may just return “404 Not Found”, but doesn’t cause harm fromHTTP semantics point of view.

⚠️ Problem is HTTP client libraries often treat any 4xx/5xx as “error responses” — for control-flow or exception purposes. Developer have to check statuses Manually, and often like:

 if (!response.IsSuccessStatusCode) //Any 4xx/5xx 
 { 
     // generic handing for "Something goes wrong" 
 } 

But 404 response for Delete case - may not be considered, and this would brakeidempotent flow.

Alternatively, returning 204 for ANY non existing entity by API MAY be an option. Yes, it's not great, but its safe if you it's too hard to control Clients.

POST methods

⚠️ Non-Idempotent by default.

POST /api/payments 
{ 
  "amount": 100, 
  "to": "acct-456" 
} 
201 Created
Location: /payments/tx-001 
If you repeat the same call, you may get: 
201 Created 
Location: /payments/tx-002
So now you’ve created two payments — that’s not idempotent.

POST methods - Turning Non-Idempotent APIs into Naturally Idempotent Ones

There is some approaches which you could use. Instead of:
POST /api/payments { ... }
Use:
PUT /api/payments/{paymentId} { ... } 
Example:

Phase1 - Client (or even UI) Obtain all generated IDs upfront, before creating any replable message. New UUID/GUID client generation on client-side whould also works.

POST /api/payments/keys
201 Created /REF-4567 

Phase2 - Create replayble message or replayble call with all IDs, and use PUT:

PUT /api/payments/REF-4567  { ... } 
200 OK 

Now the client defines the payment’s unique ID (paymentId=REF-4567), and the server will:

  • Create it once if it doesn’t exist

  • Overwrite or return existing if already processed

No need for an Idempotency-Key — idempotency emerges from the resource model itself.

If relying on Externally provided key is concern - you still have some option:

  • Persist list of already generated Keys and use them as FK
  • Have two types of Keys:
    • External, like EXT-REF-4567
    • Internal, own generated INT-2025-000123, and resolve one from other.

Option #2 - Forced Idempotency by Key for RPC or REST APIs


Works for both REST and RPC APIs. It ensures idempotency by requiring a client-generated unique key per logical operation — often called the Idempotency Key or Request ID.

Note, using Request ID is simple, but sometimes you still want have two keys - Idempotency-Key and Request ID separately, like for logging and tracing each distinct calls for same Idempotency-Key.

When a client sends:

POST /payments Idempotency-Key: TXN-123456 

The server stores the outcome of that operation keyed by TXN-123456.

If the same request is sent again, the server:

  • Recognizes the key,

  • Returns the original result, and

  • Avoids re-executing the business logic.

This mechanism ensures safety under retries and is essential in high-value transaction systems such as payments, trading, or settlement platforms.

Getting Idempotency-Key from Message

Separate case is how generate Idempotency-Key from Message. In order to have consistent behavior we need a such Message field to:

  • Be consistent between message replay, like MessageID
  • Be consistent when re-sending from Dead Letter Queue (DQL), in most providers MessageID is kept.
Simplest implementation:
string idempotencyKey = message.MessageId; 
string idempotencyKey = $"{message.MessageId}:{message.ApplicationProperties["SourceServiceName"]}";

Note: Given MessageID is technical field, need ensure it preserved:

var newMessage = new ServiceBusMessage(deadLetterMessage);
newMessage.MessageId = deadLetterMessage.MessageId; // preserve idempotency

Sometimes, choosing manual business-level ID is better. Its allows to consistent when message repackaged/rerouted:

string idempotencyKey = message.Body.PaymentTransactionId;
 

The Challenge of Idempotency Storage Growth

Idempotency storage naturally grows indefinitely — each new key must be stored to detect duplicates.
Banks typically manage this using a retention policy that balances storage cost, performance, and compliance requirements.

Common Strategies:

  1. Short-Term Caching:
    Keep idempotency records in Redis or database cache for 24–72 hours for typical API retries.

  2. Long-Term Storage:
    For financial operations, records are often persisted in durable storage (SQL, Kafka compacted topics) for months or years.

  3. Archiving Policies:
    Expired idempotency records may be archived to a regulatory-compliant store instead of being deleted, ensuring long-term auditability.

Because financial institutions are subject to strict audit rules, even temporary idempotency data might need to be retained for 7–10 years in some cases.


Handling Errors in Idempotency Records

When a request fails due to an application or infrastructure error, the error outcome is still stored against the idempotency key.
This ensures deterministic behavior:

  • If the client retries the same key, the API returns the same error response rather than re-executing the logic.

  • Operators can later manually retry with a new idempotency key after fixing the root cause.

This consistency prevents “partial retries” that could corrupt business data.

When Idempotency TTL Expires

The most subtle risk arises when a message is replayed after its idempotency key has expired.

If the original idempotency record is gone, the system may treat the replay as a new transaction, risking duplication.

Banking Countermeasures:

  1. Permanent Business Transaction IDs
    Each message carries a business-level unique identifier (TransactionId) that is stored permanently in the domain database.
    Before executing, the system checks if that ID already exists.

  2. Durable Idempotency Tables
    Instead of transient caches, idempotency records are persisted in a relational database or a compacted Kafka topic with long retention.

  3. Replay Validation Logic
    CRS performs a pre-flight check: before replaying, it queries the target system (or API) to confirm that the transaction does not already exist.

  4. Retry Window Policies
    Replays are blocked if messages are older than a defined window (e.g., 7 days).
    After that, recovery must occur through compensation, not replay.

  5. Audit Enforcement
    Every replay attempt — even rejected ones — is logged with timestamp, operator, and validation results.

This layered design ensures that expired idempotency doesn’t compromise system correctness.

Messaging Systems and Dead Letter Queues (DLQs)

In distributed banking systems, event-driven messaging is pervasive — often using KafkaIBM MQ, or Azure Service Bus.
These systems operate under at-least-once delivery semantics, which means duplicates can happen.

When a consumer fails to process a message after multiple retries, it goes to a Dead Letter Queue (DLQ).

Standard Practice:

  • Messages in DLQ are never deleted automatically.

  • They are manually reviewed and reprocessed through controlled workflows.

  • Reprocessing ensures each message’s business impact is validated before reintroduction.

However, simply pushing a DLQ message back to the main queue can be dangerous — especially if its idempotency window has expired.

Optional - Controlled Reprocessing Service (CRS)


Banks handle DLQ messages through a Controlled Reprocessing Service — a specialized subsystem that replays messages safely, under human supervision, and with full traceability.

Core Functions of CRS:

  • Retrieve messages from DLQ or audit topic.

  • Expose them through a secure Ops Portal for review.

  • Allow authorized staff to approve replays.

  • Validate that replays are safe and idempotent before reinjection.

  • Log all replays immutably for audit.

Key Features:

  • UI/Workflow - Approval process integrated with internal systems like ServiceNow or Jira

  • Pre-replay validation - Checks target system for existing transaction before replay

  • Replay rate limiting Prevents overload on consumers

  • Replay metadata - Includes replay reason, timestamp, and approver

  • Immutable audit log - Ensures end-to-end traceability for compliance

Through this process, banks ensure that no message is reprocessed without explicit authorization and safety verification.

Manual Compensation Transactions API: When Replay Is No Longer Safe

If a message is too old to safely replay, the correct procedure is to issue a compensating transaction — a deliberate, auditable, and inverse business operation.

Concept

A compensating transaction doesn’t retry the failed one; it creates a new transaction that adjusts or negates the previous state.
Examples:

  • Reversing a duplicate debit.

  • Issuing a refund for a missed credit.

  • Adjusting ledger balances after settlement failure.

Compensation API Design:

Banks could expose these as secure REST APIs accessible only to authorized Operations staff.
Example:

    POST /api/compensations/payments 
    Content-Type: application/json 
    { 
      "originalTransactionId": "TXN-2025-001234", 
      "reason": "Reversal after failed settlement (DLQ retry expired)", 
      "amount": 150.00,
      "currency": "USD", "authorizedBy": 
      "ops.alice", 
      "approvedBy": "ops.supervisor.bob", 
      "requestedAt": "2025-10-21T09:55:00Z" 
    } 
  

Behavior:

  • The service verifies the original transaction exists and is settled.

  • Ensures no prior compensation was issued.

  • Creates a new ledger entry of type “Compensation”.

  • Links it to the original transaction via RelatedTransactionId.

  • Records full audit details (who, why, when).

This ensures business balance and compliance without unsafe replays.

Security and Governance:

  • RBAC & SSO - Only authorized Ops and Supervisors can invoke compensation APIs
  • Segregation of Duties - Creator ≠ Approver
  • Audit Trail - Immutable storage of requests, approvals, and results
  • Rate Limiting - Prevents bulk compensations from overwhelming ledgers
  • Policy Engine - Defines limits, eligible transaction types, and retention
  • Digital Signatures - Used in high-value or regulatory-sensitive operations

These controls ensure that compensations are intentional, traceable, and reversible if necessary.

Putting It All Together

  • Duplicate REST calls- Solution: Idempotency Key / Transaction ID
  • Duplicate message delivery - Solution: Idempotent consumers with durable deduplication
  • DLQ replay risk - Solution: Controlled Reprocessing Service (CRS)
  • Replay after TTL expiry - Solution: Replay window policy + pre-replay validation
  • Irreversible stale replay - Solution: Compensation transaction via REST API
  • Regulatory audit - Solution: Immutable logging + segregation of duties

Conclusion

Idempotency is the bedrock of safe distributed operations.
In financial systems, it’s not just a technical feature — it’s a regulatory and business necessity.

Banks achieve it through:

  • Multi-layered persistence (cache + database)

  • Controlled reprocessing pipelines

  • Retry validation policies

  • Explicit compensation APIs

  • Immutable audit trails

Together, these ensure that every request, message, and replay is safe, traceable, and compliant — even years after the original event.

In banking-grade distributed systems, “exactly-once behavior” isn’t achieved by luck — it’s enforced through idempotency by design.

Comments

Popular Posts