Outbox pattern

The Outbox Pattern is a design pattern commonly used in distributed systems to ensure reliable message delivery (especially for events) while maintaining data consistency across service boundaries. It's especially useful when you want to guarantee that an event is published to a message broker (e.g. Kafka, RabbitMQ) only if a database transaction commits successfully.

Let’s break it down into your two key concerns:


1. How the Outbox Pattern Guarantees Service Delivery

Core Idea

Instead of publishing messages directly to a message broker, you write them to a local database table (the Outbox table) within the same transaction as your business data. Then, a separate outbox processor (or pollingPublisher) reads and forwards those messages to the message broker asynchronously.

This ensures atomicity between business data and event publishing at the database level.

Workflow

  1. Start Transaction
  2. Update Business Data (e.g., create an order)
  3. Insert Event into Outbox Table (same transaction)
  4. Commit Transaction → both business data and outbox message are persisted together
  5. Outbox Processor Polls for New Messages
  6. Publish to Message Broker (e.g. Kafka)
  7. Delete or Mark Outbox Message as Published

Guaranteeing Delivery

  • Even if the message broker is down when the transaction commits, the message is safely stored in the outbox.
  • The outbox processor retries until successful.
  • Because the outbox message is stored durably, no messages are lost.

✅ This guarantees at-least-once message delivery.


2. If the Call Fails, How to "Rollback" the First Transaction?

This is a common misconception: you don't actually need to rollback the first transaction if the outbox message delivery fails — because the transaction already succeeded.

Let’s clarify:

  • You committed the transaction successfully.
  • The business data and the outbox message are both saved.
  • Now, a separate background process tries to send that message to, say, Kafka.

➡️ If publishing fails:

  • The outbox processor does not delete or mark the message as sent.
  • It retries with exponential backoff.
  • Eventually, it will succeed (assuming network comes back, Kafka recovers, etc.).
  • No rollback is needed.

But What About the Business Data?

The business data remains valid — the user did place an order, even if we couldn't notify other services immediately. Once the outbox processor succeeds, notifications will eventually be delivered (eventual consistency).

❌ You do not rollback the business transaction after it has committed.

When Might You Want a Rollback?

Only before committing the transaction:

  • If you detect that the message broker is unreachable before persisting the outbox message, you might choose to not commit the transaction.
  • But this breaks the decoupling and adds tight coupling to the broker, which is not recommended.

✅ Instead, follow this principle:

Commit the transaction first (ensuring consistency locally), then rely on retries to deliver messages eventually.


How to Handle True Failures (Irrecoverable Cases)

Sometimes a message may be permanently undeliverable (e.g., malformed payload, broker-side configuration error).

In those cases:

  1. Do not rollback the original transaction — it’s already committed.
  2. Move the failed message to a "Dead Letter Queue" (DLQ) or mark it as "failed" with error details.
  3. Alert the operations team.
  4. Manually resolve or reprocess.

Alternatively:

  • Design consumers to be idempotent — so even if you re-send a corrected event later, it won’t create duplicate effects.

Key Components for a Robust Outbox Pattern

Component Purpose
outbox_events table Stores messages with id, event_type, payload, status, created_at, sent_at
Transactional write Insert event into outbox in same DB transaction as business data
Polling Publisher Periodically reads unprocessed messages and publishes them
Retry mechanism Handles transient failures (network, broker down)
Idempotent consumers Handle duplicate messages safely
Monitoring & DLQ Handle permanent failures

Alternatives for Exactly-Once Delivery

While Outbox guarantees at-least-once, you might combine it with:

  • Message deduplication IDs (store message ID in a processed_messages table)
  • Idempotent consumers
  • Or use transactional Kafka producers with database two-phase commit (more complex)

Summary

Goal Solution
Guarantee event delivery Store event in outbox table in same transaction
Handle broker downtime Retry asynchronously
Avoid data inconsistency Never rollback after commit — rely on retries
Support true rollbacks Only if failure happens before commit
Handle irrecoverable failures DLQ + monitoring + manual resolution

✅ The Outbox Pattern wins by embracing eventual consistency and resilience, not by trying to rollback committed transactions.


Example: Pseudocode

# Inside a service method
with db.transaction():
    # 1. Update business state
    order = create_order(user_id=123, total=100)
    
    # 2. Save event to outbox (same transaction)
    outbox_message = OutboxEvent(
        aggregate_type="Order",
        aggregate_id=order.id,
        type="OrderCreated",
        payload=json.dumps({"order_id": order.id, "total": 100})
    )
    db.save(outbox_message)

# Transaction committed — message is durable

# Background process (separate)
def deliver_outbox_messages():
    while True:
        messages = db.query("SELECT * FROM outbox WHERE sent_at IS NULL LIMIT 10")
        for msg in messages:
            try:
                kafka_producer.send(topic=msg.type, value=msg.payload)
                msg.sent_at = now()
                db.save(msg)
            except KafkaError as e:
                log.error(f"Failed to send {msg.id}, retrying...")
                # Will retry on next poll

Final Note:

You cannot and should not rollback a transaction after it's committed, especially because of downstream messaging failures. The Outbox Pattern removes that need by decoupling persistence from delivery.

Instead: make failure recovery automatic and robust.