Distributed Systems
More advanced: ext: Microservice Design Patterns Course
DNS Service
- DNS takes a URL and returns an IP
Database
- NoSQL doesn't offer joins
- Use NoSQL for unstructured data, data flows where you're only serializing and deserializing data, or storing massive amounts of data
Database Replication
- It's common to have multiple copies of your db to enable failover
Master-slave replication
- A master db will only support writes. Slave dbs will get copies of the data from the master and only support reads
- You usually have more slaves than masters since reads are more common
Failures
- If only one slave is available and it goes offline, reads will be temporarily directed to the master until a new slave can be found
- If the master goes offline, a slave will be promoted
Sharding
- Horizontal scaling of databases
- Each database is known as a shard. Instead of replicating the db, it's split up over multiple databases
- We can use a basic hash function to decide which database to use for a user: ex userid % numdbs
Issues
- Resharding - a single shared can no longer hold any more data
- Celebrity problem - Ex. multiple celebrities twitter accounts end up on the same shard
- It makes it hard to perform joins
Vertical vs. Horizontal Scaling
- Vertical if you have low traffic
- Vertical scaling doesn't allow for failover and has a hard limit
Load Balancer
- Lets you distribute traffic among servers
- The load balancer communicates with the servers via a private IP
Cache
CDN
CDNs cache static content like CSS, JS files
Ex. user in hong kong requests the data from the CDN. It isn't there, so we pull it from S3 We then cache the data in the CDN
Considerations
- Set an expiration time
Stateless Servers
- It's recommended to keep user state data in a shared DB
Geo Routing
You can split your whole web server operation between geographical regions
Ex. you have copies of your web servers, dbs, and caches over different geographical regions
Considerations
- You may need copies of your data in each region in case one data center goes out
Message Queues
Supports asynchronous communication via a producer publishing messages, and a consumer that subscribes to the message queue consuming them
Back of the Envelope Estimation
Power of 2 | Value | Name |
---|---|---|
10 | 1k | Kilobyte |
20 | 1M | Megabyte |
30 | Billion | Gigabyte |
40 | Trillion | Terabyte |
50 | Quadrillion | Perabyte |
L1 Cache - in the CPU L2 Cache - L3 Cache -
Latency numbers - page 36
Framework for System Design Interview Questions
- Don't over-engineer
- Always keep the interviewer in the loop of what you're thinking
Understand the problem and establish design scope
- Don't give quick answers. Think through and fully understand the requirements
- What features are we going to build?
- How many users?
- How fast does the company anticipate to scale?
- What's the tech stack?
- What are the most important features?
- What's the traffic volume?
- Don't give quick answers. Think through and fully understand the requirements
Propose a high-level design and get their feedback
Design deep dive
Focus on bottlenecks. Some interviewers want you to focus on high-level design
Wrap Up
- Discuss potential improvements, give a recap
Design a Rate-Limiter
Controls the rate of traffic sent by a client or a service
Ex. Number of accounts from the same ip, number of writes per second
Client-Side Requests can be forged by malicious actors
- So we should do it server side
- The Rate-Limiter should sit between the client and servers and throw HTTP errors
API Gateways are managed services that provide rate-limiting
We use Redis to store data, since it's fast and has INCR - increment and EXPIRE
Distributed Rate-Limiter
- If two requests concurrently read the counter before writing back, they will both incremented it by one
- Or we may need multiple rate-limiter servers
- We can have two clients with two rate limiters, both using a shared Redis store
- Synchronize data with an eventual consistency model
Performance Optimization
- Multi-data centers are crucial because latency will be high for users far geographically from the data center
Design Consistent Hashing
A technique to hash requests evenly across servers
Basic Technique: hash(
server_key
) % n This fails, however, when servers are added and removedInstead, let's picture a hash ring, where the hash space from 0 to 2160 - 1 is connected in a circle
Now, we put our servers evenly spaced out on the ring
To determine which server a key goes to, we start at the hash position and go forward until a key is found
Adding a Server
- Add it between s0 (server 0) and s1, then s1 and s2,
Problems with this Approach
- It's impossible to keep the servers evenly-spaced
- It's possible to have non-uniform key distribution - lots of data mapped to the same server
Virtual Nodes - a Better Approach
Virtual Nodes - each server has multiple virtual nodes on the ring. Because there's a higher count per server, the spacing becomes more even
Partitions of the ring
Design a Key-Value Store
Single-Server approach: Store key-value pairs in a hash table that keeps everything in RAM
Optimize by: Storing the most-used data on RAM and the rest on disk
- Data compression
We thus need a consistent hashing algorithm to spread the traffic
- We should spread our replicas over various data centers in different geographic regions
Consistency
We need to keep data in sync over various replicas
N - number of replicas W - Write Quorum. 1 means that each node must receive confirmation from 1 node that The data was send R - Read Quorum - The number of responses a read must wait for
- Strong consistency - any read is the most recent write
- Weak consistency
- Eventual consistency - give it time
If we have two writes on different servers that modify the same data:
- we use a vector clock to determine which came first - this stores server id and version
Handling Failures
- If a server goes down
Detecting failures
- If two servers say that a server is down, then we trust it
Gossip Protocol
- Each node maintains a node membership list - contains member IDs (other nodes) and heartbeat counts
- Each node periodically sends a heartbeat. If a heartbeat counter is lagging, the node is down
- Once a node notices that another is down, it sends heartbeats containing s2's info to random nodes
Sloppy Quorum - Temporary Failures
A technique for high availability
The system chooses the first W and R available servers for reads and writes
If a server is down, another will temporarily process requests
- When the server comes back, the temporary server will hand off that data
Handling Permanent Failures
Compare each piece of data on the replicas and update each replica to have the newest version
Reads & Writes
Reads
- Go through a memory cache first, then a bloom filter (if not present in cache) to determine which SST holds the data
Writes
- Go into the WAL (write-ahead-log), then memory cache, then SST
Full Design
- A coordinator node coordinates data from client to servers using consistent hashing
- Maintain heartbeats between nodes to keep servers up to date
Goals - Techniques
Goal | Technique |
Big Data | Consistent Hashing |
High Availability Reads | Data Replication |
High Availability Writes | Vector Clocks |
Dataset Partitioning | Consistent Hashing |
Tunable Consistency | Quorum Consensus |