CAP Theorem for System Design Interviews

The CAP theorem is an important database concept to understand in system design interviews. CAP stands for consistency, availability and partition tolerance hence the name CAP theorem.
id:: 63eb007d-e404-4950-aa61-3e20077a5856
- Consistency refers to all the users being able to see the same data at the same time.
- Availability means that the system is always available for reads or writes even with node failures.
- Partition tolerance means that the system continues to function even if communication fails between nodes.
The theorem states only two of these three properties can be ensured in a distributed system.
An assumption we make is that network partitions will happen, so to offer any kind of reliable services, partition tolerance is necessary. So you can't choose to forfeit the P in the CAP. That make leaves us to make the trade-off between ensuring consistency and availability.
id:: 63eb0459-f049-4ba1-9ce7-5449b9144de8
Consistency
- Consistency is the property that after a write is sent to a database, all read requests send to any node should return that updated data.
- Imagine a scenario where there's a network partition in case both node A and node B would reject any write request sent to them. This would ensure the state of the data of that two nodes are the same. Otherwise, only node A have the updated data and node B have stale data.
- Sometimes you'll hear the term strong consistency. The consistency in CAP theorem does not necessarily refer to strong consistency. In a strongly consistent database, if data is written and then immediately read after, in theory, it should always return the updated data.
- However, in a distributed system, network communication doesn't happen instantly since nodes are physically separated from one another and transferring data requires some amount of time.
- This is the reason why it's not possible to have a perfect strongly consistent distributed database.
  id:: 63eb08ce-93b6-45b2-84c7-252f2d2afee3
- In the real world, when we talk about databases that prioritize consistency, we usually refer to databases that are eventually consistent with a very short unnoticeable lag time between nodes.
  id:: 63eb0a3e-118c-4dae-8d3e-fca205c2f385
- So to summarize again, consistency in the CAP theorem means that all users would be able to see the same data at the same time.
Availability
- In a database that prioritize availability is acceptable to have inconsistent data across the nodes where one node may contain stale data and another has the more recent data.
- Availability means that we prioritize nodes to complete requests sent to them. Available database tend to have eventual consistency which means that after some amount of time with a network partition is resolved, all nodes will eventually sync with one another to have the same updated data. In this case, node A will receive the update at first, and after some time, node B will be updated as well.
Consistency or Availability?
- So when should consistency or availability be prioritized? If you're working with data that needs to be up to date, then you'd want to prioritize consistency. On the other hand, if it's okay that the queried data can be slightly out of date, then storing it in an available database may be a better choice.
- Here are some examples for a better understanding.
  - Imagine you're building an app like Amazon where shoppers can browse a catalog of products and then purchase them if they're in stock. You want to make sure that the products are actually in the stock or you'd have to refund the shoppers for unavailable items.
  - Should your database that stores product information prioritize consistency or availability? Here you should choose consistency. In the case of network partition that nodes can't sync with one another, you'd rather not allow any shoppers to purchase any products than have two or more shoppers purchase the same product when there is only one item available.
  - Available database would allow for the latter and then at least one of the shoppers would have to have their order canceled or refunded.
  - Let's imagine another scenario where the the same Amazon-esque product, let's say a PM decides that it's much more cost effective to refund shoppers rather than to show them as out of stock during a network failure. In this case, you'd want to prioritize availability since canceling and refunding orders will be preferable to not allowing shoppers to purchase a product at all during a network failure.
    id:: 63eb1f8c-43d8-40a2-94fa-9fc99a458545
From the discussion above, you would have noticed that only write requests are discussed here, this is because read requests don't change the state of data and don't require re-syncing between nodes. Read requests are typically fine during network partitions for both consistent and available databases.