What is a race condition in microservices?

A race condition occurs when two or more concurrent requests read shared state, make a decision based on that state, and then mutate it — with no guarantee that the state has not changed between the read and the write. In microservices this manifests as double-charges, phantom inventory allocation, or conflicting saga transitions rather than low-level memory corruption.

Does idempotency prevent all race conditions?

Idempotency keys suppress duplicate-request replay but do not serialize concurrent first-time requests that share the same resource. TOCTOU races between two distinct requests require either optimistic concurrency control (version checks) or pessimistic locking (distributed locks) in addition to idempotency.

When should I use optimistic versus pessimistic concurrency control?

Use optimistic control when contention is low and conflict retries are cheap — read-heavy analytics or catalogue services are typical fits. Use pessimistic locking (distributed locks or SELECT FOR UPDATE) when contention is high and a failed attempt carries a significant cost, as in payment ledger updates or seat reservation systems.

Preventing Race Conditions in Microservices

Part of: Distributed Coordination & Locking Strategies

In distributed architectures, race conditions rarely manifest as low-level memory-access violations. They emerge as high-level business-logic conflicts: double-charges, phantom inventory allocations, or divergent saga transitions triggered by concurrent API calls reaching two separate service instances at the same millisecond. The foundational shift required is from thread-local synchronization to deterministic, network-aware state-mutation contracts — and the primary control mechanism for that shift is idempotency key generation, which guarantees that repeated identical requests produce identical system states without duplicating side effects.

Guarantee Model

Before selecting an implementation, you must state the precise contract your service must uphold, because the contract determines which mechanisms are necessary and which are sufficient.

Guarantee	Definition	Breaks Under
At-least-once delivery + exactly-once processing	The operation reaches the service at least once; the side effect executes exactly once regardless of retries.	Deduplication-store loss during failover; key collision under adversarial deterministic hashing.
Linearizability	Every operation appears to take effect at a single, globally agreed instant. Reads reflect all preceding writes.	Network partition that prevents quorum reads; clock skew exceeding lock TTL.
Causal consistency	Writes causally related to a prior read are observed in order by all nodes.	Out-of-order delivery across partitions with no causal dependency tracking.
Optimistic exactly-once	Version-checked writes succeed only when the precondition holds; conflicts are returned as errors to the caller.	High-contention workloads where retry amplification exceeds acceptable latency budgets.

Most payment and inventory services require at-least-once delivery with exactly-once processing. Linearizability is reserved for leader-election and distributed-lock coordination, because the coordination overhead degrades write throughput proportionally to the number of participating replicas.

Concurrency Hazard Taxonomy

Three failure modes account for the majority of production race conditions:

Time-of-Check to Time-of-Use (TOCTOU): Validation passes against a read snapshot; state mutates before the write is committed. Classic example: balance check succeeds, concurrent debit reduces balance, original debit commits against a stale read.
Lost Update: Two writers read the same version, compute independent deltas, and both commit — the second write silently discards the first delta.
Phantom Read / Double Execution: A load balancer or client retry delivers the same logical request twice; both instances pass deduplication checks because neither has committed yet.

All three require different remediation strategies, which is why no single mechanism eliminates all races at once.

Core Algorithm: Atomic Idempotency with Distributed Lock Serialization

The following sequence is the canonical approach for a stateful mutation endpoint that must be safe to retry:

Step-by-step protocol:

Client generates a Idempotency-Key (UUIDv4 or UUIDv7) and attaches it to every request in the mutating call.
API Gateway validates the key format (128-bit hex, reject malformed keys with HTTP 422 before the request reaches the service).
Service acquires a distributed lock keyed on idempotency:{key} using SET NX EX 30 — this serializes concurrent requests sharing the same key.
Service performs a point read against the deduplication store: SELECT result WHERE idempotency_key = ?.
On a cache miss (first request): execute the business mutation and the deduplication-record insert inside a single database transaction.
Commit the transaction, then release the lock with a Lua DEL guarded by owner token comparison to avoid releasing a lock acquired by a different holder.
On a cache hit (duplicate request): return the stored result without re-executing the mutation.

Implementation Variants

Variant 1 — Redis SET NX with Lua Owner Guard

Best for high-throughput APIs where sub-millisecond lock acquisition is required and short TTLs (15–30 s) are acceptable.

-- Atomic release: only delete if we are the lock owner
if redis.call("GET", KEYS[1]) == ARGV[1] then
  return redis.call("DEL", KEYS[1])
else
  return 0
end

// Go — acquire lock, check dedup, execute, release
func processOnce(ctx context.Context, rdb *redis.Client, db *sql.DB, key, ownerToken string) (Result, error) {
    lockKey := "idempotency:" + key
    // Acquire with 30 s TTL
    ok, err := rdb.SetNX(ctx, lockKey, ownerToken, 30*time.Second).Result()
    if err != nil || !ok {
        return Result{}, ErrLockContention
    }
    defer releaseLock(ctx, rdb, lockKey, ownerToken)

    // Check dedup store
    var cached Result
    err = db.QueryRowContext(ctx,
        "SELECT response_body FROM idempotency_results WHERE key = $1", key,
    ).Scan(&cached.Body)
    if err == nil {
        return cached, nil // duplicate — return stored result
    }

    // Execute mutation + persist result in one transaction
    tx, _ := db.BeginTx(ctx, nil)
    result, err := executeMutation(ctx, tx)
    if err != nil {
        tx.Rollback()
        return Result{}, err
    }
    tx.ExecContext(ctx,
        "INSERT INTO idempotency_results(key, response_body, expires_at) VALUES($1,$2,NOW()+INTERVAL '24 hours')",
        key, result.Body,
    )
    return result, tx.Commit()
}

Variant 2 — PostgreSQL Advisory Lock + Unique Constraint

Appropriate when the deduplication store is already PostgreSQL and cross-service cache loss must not allow duplicate processing. Uses pg_try_advisory_xact_lock (transaction-scoped) so the lock releases automatically on commit or rollback.

-- Advisory lock keyed to a 64-bit hash of the idempotency key
BEGIN;

SELECT pg_try_advisory_xact_lock(hashtext('idempotency:' || $1));
-- Returns FALSE if another transaction holds the lock → caller retries

-- Upsert dedup record; UNIQUE constraint prevents double-insert
INSERT INTO idempotency_results (key, response_body, created_at)
VALUES ($1, $2, NOW())
ON CONFLICT (key) DO NOTHING;

-- Mutation executes only when INSERT above affected 1 row
-- (0 rows = duplicate, skip mutation)

COMMIT;

# Python (psycopg2) — wrap mutation in advisory-lock transaction
import psycopg2, hashlib, struct

def process_once(conn, idem_key: str, payload: dict):
    lock_id = struct.unpack("q", hashlib.sha256(idem_key.encode()).digest()[:8])[0]
    with conn.cursor() as cur:
        conn.autocommit = False
        cur.execute("SELECT pg_try_advisory_xact_lock(%s)", (lock_id,))
        acquired = cur.fetchone()[0]
        if not acquired:
            conn.rollback()
            raise LockContentionError(idem_key)

        cur.execute(
            "INSERT INTO idempotency_results(key, response_body) VALUES(%s, %s) ON CONFLICT DO NOTHING",
            (idem_key, "{}"),
        )
        if cur.rowcount == 0:         # duplicate
            conn.rollback()
            return fetch_cached(cur, idem_key)

        result = execute_mutation(cur, payload)
        cur.execute(
            "UPDATE idempotency_results SET response_body=%s WHERE key=%s",
            (result, idem_key),
        )
        conn.commit()
        return result

Variant 3 — Optimistic Version Check (Low-Contention Paths)

Skip the explicit lock entirely; instead rely on a version column and a conditional update. On conflict (0 rows updated), return HTTP 409 and let the client retry with exponential backoff and jitter.

-- Conditional update — succeeds only when version matches
UPDATE accounts
   SET balance = balance - $1,
       version  = version + 1
 WHERE account_id = $2
   AND version    = $3
RETURNING balance, version;
-- 0 rows → concurrent modification detected → caller retries

// Java (JDBC) — optimistic update with retry
public TransferResult transferWithOptimisticLock(
        Connection conn, long accountId, long amount, long expectedVersion) throws SQLException {
    String sql = "UPDATE accounts SET balance = balance - ?, version = version + 1 " +
                 "WHERE account_id = ? AND version = ? RETURNING balance, version";
    try (PreparedStatement ps = conn.prepareStatement(sql)) {
        ps.setLong(1, amount);
        ps.setLong(2, accountId);
        ps.setLong(3, expectedVersion);
        ResultSet rs = ps.executeQuery();
        if (!rs.next()) {
            throw new OptimisticLockException("version mismatch for account " + accountId);
        }
        return new TransferResult(rs.getLong("balance"), rs.getLong("version"));
    }
}

Variant 4 — Saga Compensation for Multi-Service TOCTOU

When a TOCTOU race spans two or more services (e.g., inventory reservation followed by payment capture), neither a single lock nor a single transaction can span the boundary. The saga pattern sequences the steps and registers compensating transactions for each:

// Go — saga orchestrator with compensation stack
type SagaStep struct {
    Execute    func(ctx context.Context) error
    Compensate func(ctx context.Context) error
}

func runSaga(ctx context.Context, steps []SagaStep) error {
    completed := make([]SagaStep, 0, len(steps))
    for _, step := range steps {
        if err := step.Execute(ctx); err != nil {
            // Roll back in reverse order
            for i := len(completed) - 1; i >= 0; i-- {
                _ = completed[i].Compensate(ctx)
            }
            return err
        }
        completed = append(completed, step)
    }
    return nil
}

Variant Comparison

Variant	Contention profile	Failure recovery	Latency overhead	Best for
Redis SET NX + Lua release	High-throughput, short critical sections	Re-acquire after TTL expiry (30 s)	~1–3 ms per lock round trip	Payment APIs, seat reservation
PostgreSQL advisory lock	Medium throughput, durability required	Auto-released on transaction end	~2–6 ms (local), ~8–20 ms (remote)	Fintech ledgers, audit-required flows
Optimistic version check	Low contention, read-heavy	Caller retries on 409	~0 ms coordination overhead	Catalogue updates, non-critical counters
Saga + compensation	Cross-service, long-running	Compensating transactions roll back state	Variable — one network hop per step	Multi-service checkout, booking flows

Edge Cases & Failure Scenarios

Failure Scenario	Remediation Steps	Observability Hooks
Dedup store unavailable at lock-check time	Fail closed: reject the request with HTTP 503 rather than proceeding without deduplication. Queue the retry for when the store recovers. Do not silently degrade to non-idempotent behavior.	`idempotency_store_errors_total` counter; alert at >0.1% of requests over 1-minute window.
Lock TTL expires during slow mutation	Use a background heartbeat goroutine to refresh the lock TTL every TTL/3 seconds. On heartbeat failure (lock already released), abort the mutation and trigger a compensating transaction. See lock timeout and lease management for full lease-renewal patterns.	`lock_lease_renewals_total`, `lock_expired_mid_mutation_total`; alert on any non-zero value for the latter.
Split-brain: two service replicas both acquire lock	Enforce fencing tokens — the lock store returns a monotonically increasing token; the DB write includes `WHERE fencing_token = ?` so only the first commit wins.	`fencing_token_conflicts_total`; correlate with `network_partition_events` in the service mesh.
Clock skew causes premature lock release	Synchronise hosts to NTP/PTP (target drift <1 ms); set lock TTL ≥ 3× the p99 mutation latency. Treat lock TTL < 10 s as a configuration error.	`ntp_offset_ms` gauge per node; alert when offset >500 ms.
Thundering herd on retry storm	Add token-bucket limiting per idempotency key (cap at 3 concurrent holders). Use request coalescing to return the in-flight result to later arrivals. See mitigating thundering herd during retry storms for token-bucket implementation.	`retry_queue_depth` per key; `coalesced_requests_total` to confirm coalescing is active.
Saga step fails after partial commits	Compensating transactions must be idempotent themselves — wrap each compensation in its own idempotency key derived from the original saga ID. Persist compensation status to a saga log before executing.	`saga_compensation_invocations_total`, `saga_compensation_failures_total`; alert when compensation failure rate >0.

Operational Concerns

TTL Management

Idempotency-record TTL must exceed the maximum observable retry window, including any queuing delays at the client or message broker. A safe default is 24 hours for synchronous REST APIs. Reduce to 4 hours only when the idempotency store has strict memory budgets and the client SLA guarantees no retries after 1 hour. For Redis-based deduplication, set maxmemory-policy to noeviction on the deduplication namespace — never allkeys-lru, which can silently evict live dedup records.

Index Strategy

-- PostgreSQL: covering index for point reads by key + expiry sweep
CREATE UNIQUE INDEX CONCURRENTLY idx_idem_key
    ON idempotency_results (key)
    INCLUDE (response_body, created_at);

-- Partial index to accelerate TTL-based cleanup
CREATE INDEX CONCURRENTLY idx_idem_expires
    ON idempotency_results (created_at)
    WHERE created_at < NOW() - INTERVAL '24 hours';

Run the cleanup sweep as a background job every 15 minutes, deleting at most 10,000 rows per batch to avoid table-lock contention during peak hours.

Memory and Storage Budgeting

Redis: each dedup record (UUID key + JSON response) averages 400–800 bytes with encoding overhead. At 50,000 RPM and a 24-hour TTL, budget for approximately 3.5 GB of active dedup state. Add 20% headroom for keyspace metadata.
PostgreSQL: the same volume with a TOAST-stored response_body adds roughly 4.2 GB to the table plus index. Partition by created_at (monthly) so old partitions can be dropped without VACUUM pressure.

SRE Alert Thresholds

Metric	Warning	Critical	Action
`lock_acquisition_p99_ms`	>50 ms	>200 ms	Investigate Redis/Postgres saturation; scale lock store.
`duplicate_requests_ratio`	>5%	>15%	Client retry logic is too aggressive; tighten backoff parameters.
`idempotency_store_errors_total`	>0.05%/min	>0.5%/min	Fail closed; page on-call; check store health.
`lock_expired_mid_mutation_total`	>0	>5/min	Increase TTL or reduce mutation latency; check GC pause times.
`saga_compensation_failures_total`	>0	>1/min	Manual investigation required; compensation failure risks inconsistent state.

Stack-Specific Constraints

JVM: ReentrantLock is in-process only — GC pauses can delay lock release, extending critical sections beyond the expected lock TTL. Use distributed lock TTLs of at least 3× the p99 GC pause duration. Monitor jvm_gc_pause_seconds_max and alert when it approaches 10 s.
Go: sync.Mutex is in-process. Goroutine scheduling preemption under high CPU contention can delay heartbeat renewal; ensure heartbeat goroutines run in a separate goroutine pool from the hot path.
Node.js: The single-threaded event loop eliminates thread-level races, but async/await boundaries create windows where two coroutines both pass a deduplication check before either commits. Wrap the check-and-insert in a single database transaction, not two separate await calls.
Database isolation levels: READ COMMITTED prevents dirty reads but allows phantom updates; REPEATABLE READ prevents lost-update anomalies under PostgreSQL’s SSI implementation. Use SERIALIZABLE only on high-value, low-throughput paths (e.g., end-of-day settlement) — it increases transaction abort rates by 5–30× under contention. For distributed leases, apply lock timeout and lease management patterns to prevent long-running transactions from holding advisory locks past TTL.

Distributed Coordination & Locking Strategies — parent section covering the full spectrum of coordination primitives for distributed systems.
Distributed Lock Acquisition Patterns — deep dive into Redlock, single-node Redis locking, and ZooKeeper-based leader election.
Lock Timeout & Lease Management — monotonic TTL enforcement, lease heartbeat renewal, and stale-lock detection.
Consensus Algorithms for Deduplication — Raft and Paxos-based approaches for quorum agreement on deduplication state.
Mitigating Thundering Herd During Retry Storms — token-bucket limiters, request coalescing, and backpressure propagation for high-contention periods.
Retry Logic & Backoff Fundamentals — exponential backoff with jitter to safely retry idempotent requests without amplifying load.
Redis Cache-Based Deduplication — implementation guide for the Redis SET NX deduplication store used in Variant 1 above.