Outbox Pattern Explained: Reliable Event Publishing Without Transaction Rollbacks

Learn how the Outbox Pattern atomically stores updates, guarantees event delivery, retries on broker failures, and routes errors to a dead‑letter queue.

Introduction

In distributed systems, reliably publishing events to a message broker while keeping the underlying business data consistent is a classic challenge.
The Outbox Pattern solves this by persisting events in the same database transaction that modifies the business data, then delivering those events asynchronously.

This post walks through how the pattern guarantees delivery, why you never need to roll back a committed transaction when publishing fails, and how to handle truly unrecoverable errors.


How the Outbox Pattern Works

Step What Happens
1️⃣ Start a DB transaction Open a transaction in your service.
2️⃣ Update business state e.g., create an order, change inventory, etc.
3️⃣ Insert an outbox record Write a row to an outbox_events table containing the event type, payload, and any correlation IDs. This write happens inside the same transaction as the business update.
4️⃣ Commit the transaction Both the business data and the outbox entry become durable together.
5️⃣ Asynchronous processor polls A background job (or change‑data‑capture stream) reads rows where sent_at IS NULL.
6️⃣ Publish to the broker The processor sends the payload to Kafka, RabbitMQ, etc.
7️⃣ Mark the row as sent On success, update sent_at (or delete the row). If publishing fails, the row stays untouched for a later retry.

Result: The database guarantees atomicity between the business change and the "intent to publish". The separate processor guarantees at‑least‑once delivery despite broker outages.


Guarantees of Service Delivery

  1. Atomic persistence – because the outbox entry is written in the same transaction as the domain data, either both succeed or both fail.
  2. Durable storage – the outbox table lives in the same relational store that already guarantees durability and recovery.
  3. Retry‑until‑success – the poller keeps trying until the broker acknowledges the message.
  4. Decoupling – your core service code doesn’t need to know whether the broker is up; it only cares about committing its transaction.

The pattern therefore provides eventual consistency and at‑least‑once semantics without forcing the service to block on external systems.


What Happens When Publishing Fails?

  • The transaction has already been committed. The business operation (e.g., “order created”) is now part of the system’s state.
  • The outbox row remains unprocessed, so the poller will retry.
  • No rollback is required or possible – you cannot undo a committed transaction without breaking the atomic guarantee you just achieved.

Why Not Roll Back?

Situation Correct Action
Broker down before you attempt to publish Still commit the transaction; let the background processor retry later.
Broker down after you’ve committed The outbox row stays in the table; the processor will resend when the broker recovers.
Permanent publishing error (malformed payload, schema mismatch) Move the message to a Dead Letter Queue (DLQ), flag it as failed, and alert ops. Do not roll back the business data.

Handling Irrecoverable Failures

  1. Detect the error – the poller catches exceptions that are not transient (e.g., validation errors).
  2. Mark the event as failed – add a status = 'failed' column or move the row to a separate dead_letter_outbox table.
  3. Log and alert – feed the error into monitoring/alerting pipelines.
  4. Manual or automated re‑processing – after fixing the payload or broker config, replay the event from the DLQ.

Because the original business transaction is already committed, you never need to “undo” it.


Building a Robust Outbox Implementation

Core Table Schema

CREATE TABLE outbox_events (
    id               BIGSERIAL PRIMARY KEY,
    aggregate_type   TEXT NOT NULL,            -- e.g., 'Order'
    aggregate_id     BIGINT NOT NULL,          -- primary key of the domain entity
    event_type       TEXT NOT NULL,            -- e.g., 'OrderCreated'
    payload          JSONB NOT NULL,
    created_at       TIMESTAMPTZ DEFAULT now(),
    sent_at          TIMESTAMPTZ NULL,
    failed_at        TIMESTAMPTZ NULL,
    error_message    TEXT NULL,
    status           TEXT NOT NULL DEFAULT 'pending'   -- pending | sent | failed
);

Transactional Write (Pseudo‑code)

with db.transaction():
    order = Order.create(user_id=uid, total=price)
    outbox = OutboxEvent(
        aggregate_type='Order',
        aggregate_id=order.id,
        event_type='OrderCreated',
        payload=json.dumps({'order_id': order.id, 'total': price})
    )
    db.save(outbox)        # both inserts happen atomically
# commit happens here

Polling Publisher (Simplified)

def publish_loop():
    while True:
        batch = db.fetch(
            "SELECT * FROM outbox_events WHERE sent_at IS NULL AND status='pending' LIMIT 50"
        )
        for ev in batch:
            try:
                kafka.producer.send(ev.event_type, ev.payload.encode())
                db.execute(
                    "UPDATE outbox_events SET sent_at = now(), status='sent' WHERE id = %s",
                    (ev.id,)
                )
            except TransientError as e:
                logger.warning(f'Transient failure for {ev.id}: {e}')
                # leave row untouched – will be retried
            except PermanentError as e:
                db.execute(
                    "UPDATE outbox_events SET failed_at = now(), status='failed', error_message=%s WHERE id = %s",
                    (str(e), ev.id)
                )
                alert_ops(ev, e)
        sleep(POLL_INTERVAL)

Monitoring & Alerts

  • Metrics: number of pending events, failed events, average latency from created_atsent_at.
  • Dashboards: visualize backlog spikes indicating broker issues.
  • Alert thresholds: e.g., “> 5 min backlog” → page on‑call.

Idempotent Consumers – Closing the Loop

Since the Outbox pattern delivers at‑least‑once, downstream services must tolerate duplicates:

  • Include a deduplication key (the id of the outbox row) in the message.
  • Consumers store processed IDs in a fast lookup (Redis, DB) and ignore repeats.
  • Alternatively, design business logic to be idempotent (e.g., “create order if not exists”).

TL;DR Checklist

  • Write domain changes and outbox record inside a single DB transaction.
  • Commit the transaction; never try to roll it back after a publish failure.
  • Run an asynchronous poller that reliably pushes outbox rows to the broker, retrying on transient errors.
  • Move permanently failing rows to a DLQ, log, and alert.
  • Make consumers idempotent to handle at‑least‑once delivery safely.

By embracing eventual consistency and decoupling persistence from delivery, the Outbox pattern gives you reliable event propagation without tightly coupling your service to external messaging systems.

Made with chatblogr.com