Designing a Scalable Payment System
Designing a payment system requires high consistency, reliability, and security. In this article, we'll design a high-level architecture for processing payments at enterprise scale, focusing on Idempotency, Transactional Integrity, and Asynchronous Processing.
Core Engineering Principles
[!IMPORTANT] In financial systems, Reliability > Latency. It is better to wait 500ms for a confirmed transaction than to have a 50ms response that might lead to double charging or lost records.
- Idempotency: Every payment request must have a unique
idempotency_key. This ensures that even if a network timeout occurs and the client retries, we do not result in double charging. - ACID Transactions: Financial records must be atomic and consistent. We use relational databases (PostgreSQL/MySQL) with strict locking for ledger updates.
- Scalable State Machine: A payment goes through several states:
PENDING→PROCESSING→SUCCEEDED/FAILED.
High-Level Architecture
The architecture follows a hexagonal pattern to decouple our core logic from external payment providers and downstream consumers.
Arch Note
Interactive logic enabled. Click components in expanded view for technical service definitions.
Database Schema (ERD)
A robust payment system starts with a well-designed schema for auditability.
Arch Note
Interactive logic enabled. Click components in expanded view for technical service definitions.
Implementation: Idempotency in Golang
Using Redis to store and validate request keys quickly before hitting the relational database.
func (s *PaymentService) ProcessPayment(ctx context.Context, req *PaymentRequest) (*PaymentResponse, error) {
// 1. Check Redis for existing Idempotency Key
// Use SETNX (Set if Not Exists) for atomic locking
exists, err := s.redis.SetNX(ctx, req.IdempotencyKey, "PROCESSING", 30*time.Minute).Result()
if err != nil {
return nil, fmt.Errorf("idempotency check failed: %w", err)
}
if !exists {
// Log the duplicate attempt and return the previously stored result if any
return nil, ErrDuplicateRequest
}
// 2. Wrap in DB Transaction
err = s.db.WithTransaction(func(tx *sql.Tx) error {
// a. Create PENDING record
// b. Record Audit Log
return nil
})
if err != nil {
s.redis.Del(ctx, req.IdempotencyKey)
return nil, err
}
return &PaymentResponse{Status: "SUCCESS"}, nil
}Failure Mode Analysis
| Scenario | Impact | Mitigation Strategy |
|---|---|---|
| Provider Timeout | Unknown State | Polling/Webhook Reconsiliation. Call provider status API before retrying. |
| Database Down | Service Outage | Local Buffering/Outbox Pattern. Store requests in a persistent queue temporarily. |
| Kafka Delay | Stale Data | Eventual Consistency. Use unique transaction IDs for consumer-side idempotency. |
[!TIP] Consultant's Choice: For startups, start with a synchronous flow for simplicity. For enterprise scale (1000+ tps), adopt an Asynchronous Orchestration pattern to avoid blocking threads on external API calls.
Observability & Monitoring
To maintain 99.99% availability, track these core metrics:
- Payment Success Rate (PSR): Percentage of successful vs failed transactions.
- Mean Time to Reconcile (MTTR): How long it takes for a "lost" transaction to be recovered.
- Provider Latency: P99 response times of external gateways.
- Error Distribution: Monitor for spike in
IDEMPOTENCY_MISMATCHorCARD_DECLINED.
This architecture ensures that even if the server crashes mid-transaction, we can reconcile the state later using the pending record and idempotency key.