Consistency in Distributed Systems: The Data Truth

In our last post on Availability, we saw that Replication is key to keeping systems alive. But Replication introduces a new problem: Consistency.

If you have two copies of a database, and you update one, the other is instantly “stale.” Ideally, both would update instantly, but in the real world, that takes time.

What is Consistency?

Consistency means that every read receives the most recent write or an error. In simple terms: All clients see the same data at the same time.

Real-Life Example: The ATM Problem

Imagine you have $500 in your bank account.

You withdraw $500 from an ATM in Delhi.
Your Spouse simultaneously tries to withdraw $500 from an ATM in Pune.

In a Consistent system, the second transaction fails because the balance is $0. In an Inconsistent system, both transactions might succeed, and the bank loses money.

Monoliths vs. Distributed Systems

System Type	Consistency Level	Why?
Monolithic	Naturally High	Single database (Source of Truth). No sync needed.
Distributed	Hard to Maintain	Data is spread across nodes. Sync takes time (Network Latency).

Types of Consistency Models

In distributed systems, you often have to choose between speed and accuracy.

1. Strong Consistency

Definition: Once a write is successful, all subsequent reads return the new value.
Mechanism: The system locks data during updates. Read operations wait until all replicas acknowledge the update.
Trade-off: Higher Latency (slower).
Use Case: Banking Systems, Stock Trading, Train Ticket Booking (IRCTC). You cannot sell the same seat twice.

  sequenceDiagram
    participant Client
    participant Primary
    participant Replica
    Client->>Primary: Write(X=10)
    Primary->>Replica: Sync(X=10)
    Replica-->>Primary: Acknowledge
    Primary-->>Client: Success
    Note right of Client: Now all reads see X=10

2. Eventual Consistency

Definition: If no new updates are made, eventually all accesses will return the last updated value.
Mechanism: The system returns “Success” to the client immediately after updating the primary node. Replicas update in the background.
Trade-off: Lower Latency (faster), but risk of stale data.
Use Case: Social Media Feeds (Instagram/Twitter). If you change your profile picture, it’s okay if your friend sees the old one for a few more seconds.

3. Weak Consistency

Definition: There is no guarantee that a read will return the most recent write. It relies on “best effort.”
Use Case: Live Video Streaming, VoIP. If you miss a frame of video, it’s gone. The system doesn’t pause to sync.

How to Improve Consistency

Stop Read Operations: During a major update, put the system in “Maintenance Mode.” (Brute force approach).
Reduce Replica Distance: Place replicas closer to each other to minimize sync time.
Application Coordination: Use consensus algorithms like Paxos or Raft (used in Kubernetes/Etcd) to ensure nodes agree on the truth.

Conclusion

Consistency is a spectrum.

Need Accuracy? Choose Strong Consistency (but accept slower performance).
Need Speed? Choose Eventual Consistency (but accept temporary staleness).

As a system architect, your job is to know which part of your app needs which model. Billing must be Strong; Recommendations can be Eventual.