Designing a Scalable Payment System

Designing a payment system requires high consistency, reliability, and security. In this article, we'll design a high-level architecture for processing payments at enterprise scale, focusing on Idempotency, Transactional Integrity, and Asynchronous Processing.

Core Engineering Principles

[!IMPORTANT] In financial systems, Reliability > Latency. It is better to wait 500ms for a confirmed transaction than to have a 50ms response that might lead to double charging or lost records.

Idempotency: Every payment request must have a unique idempotency_key. This ensures that even if a network timeout occurs and the client retries, we do not result in double charging.
ACID Transactions: Financial records must be atomic and consistent. We use relational databases (PostgreSQL/MySQL) with strict locking for ledger updates.
Scalable State Machine: A payment goes through several states: PENDING → PROCESSING → SUCCEEDED / FAILED.

High-Level Architecture

The architecture follows a hexagonal pattern to decouple our core logic from external payment providers and downstream consumers.

Live architecture

Compiled: v2.0-Production

Analyzing Schema...

Arch Note

Interactive logic enabled. Click components in expanded view for technical service definitions.

Layer.0 / Distributed_System_Viz

Database Schema (ERD)

A robust payment system starts with a well-designed schema for auditability.

Live architecture

Compiled: v2.0-Production

Analyzing Schema...

Arch Note

Interactive logic enabled. Click components in expanded view for technical service definitions.

Layer.0 / Distributed_System_Viz

Implementation: Idempotency in Golang

Using Redis to store and validate request keys quickly before hitting the relational database.

func (s *PaymentService) ProcessPayment(ctx context.Context, req *PaymentRequest) (*PaymentResponse, error) {
    // 1. Check Redis for existing Idempotency Key
    // Use SETNX (Set if Not Exists) for atomic locking
    exists, err := s.redis.SetNX(ctx, req.IdempotencyKey, "PROCESSING", 30*time.Minute).Result()
    if err != nil {
        return nil, fmt.Errorf("idempotency check failed: %w", err)
    }
    if !exists {
        // Log the duplicate attempt and return the previously stored result if any
        return nil, ErrDuplicateRequest
    }
 
    // 2. Wrap in DB Transaction
    err = s.db.WithTransaction(func(tx *sql.Tx) error {
        // a. Create PENDING record
        // b. Record Audit Log
        return nil
    })
 
    if err != nil {
        s.redis.Del(ctx, req.IdempotencyKey)
        return nil, err
    }
 
    return &PaymentResponse{Status: "SUCCESS"}, nil
}

Failure Mode Analysis

Scenario	Impact	Mitigation Strategy
Provider Timeout	Unknown State	Polling/Webhook Reconsiliation. Call provider status API before retrying.
Database Down	Service Outage	Local Buffering/Outbox Pattern. Store requests in a persistent queue temporarily.
Kafka Delay	Stale Data	Eventual Consistency. Use unique transaction IDs for consumer-side idempotency.

[!TIP] Consultant's Choice: For startups, start with a synchronous flow for simplicity. For enterprise scale (1000+ tps), adopt an Asynchronous Orchestration pattern to avoid blocking threads on external API calls.

Observability & Monitoring

To maintain 99.99% availability, track these core metrics:

Payment Success Rate (PSR): Percentage of successful vs failed transactions.
Mean Time to Reconcile (MTTR): How long it takes for a "lost" transaction to be recovered.
Provider Latency: P99 response times of external gateways.
Error Distribution: Monitor for spike in IDEMPOTENCY_MISMATCH or CARD_DECLINED.

This architecture ensures that even if the server crashes mid-transaction, we can reconcile the state later using the pending record and idempotency key.