System Design Primer
System Design Interview ext: Article Link
Database
ACID
- Atomicity - each transaction is all or nothing
- Consistency - the database stays valid between transactions
- Isolation - concurrent transactions has the same results as serial
- Durability - once a transaction completes, it remins complete
Replication
Master-Slave
- The master serves reads, the slaves only serve writes
Master-Master
Both serve reads and writes and coordinate on writes
Disadvantages:
- Requires a load balancer
- Either violates ACID (loosely consistent) or requires slow synchronization logic
Federation (Functional Partitioning)
- Splitting up the database by function - ex. forums, users, products
Disadvantages
- Complex joins
- Complex queries over multiple dbs
Sharding
- Split the data over multiple dbs
Disadvantages
- Complex queries
- Complex joins
Denormalization
- Improve reads, hinder writes
Redundant copies of data are written in multiple tables to avoid expensive writes
This strategy works in conjunction with federation and sharding
SQL Tuning
- Using CHAR instead of VARCHAR for fixed-length fields
- INT for large numbers up to 232
- DECIMAL for currency
- Set the NOT NULL constraint wherever possible
- Use good indices
- Denormalize data that will be frequently joined
- Partition hot spots
Indices
- Represented as self-balancing B-trees
Caching
- Whenever your application tries to read data, it should first look through the cache
Two Approaches:
Cache the result of database queries
(Recommended) Cache objects
- Store the complete class in the db, or
- Store the arrays, etc. in the db
Cache:
- User sessions
- Blog articles
- user-friend relationships
Asynchronism
Have worker nodes that constantly check from a message queue
Once it's done, they send a completion message
Anything time-consuming, do it async
Latency vs. Throughput
- Latency - the time to perform an action
- Throughput - the number actions / time. Ex. 120 cars per day
Availability vs. Consistency
A system can only support two of the following:
- Consistency - Every read receives the most recent write
- Availability - Every request receives a response
- Partition Tolerance
Consistency Patterns
- Weak Consistency - Reads may or may not see it
- Eventual consistency - Reads will eventually see the write
- Strong Consistency - Reads will see it
Availability Patterns
Fail-over - Switching reliably between backup systems
- Active-passive - Heartbeats are sent between systems. If it's interrupted, the passive server takes over
- Aka master-slave failover
- Active-Active - Both servers manage traffic
Delivery of Information
DNS - Translates a URL to an IP address
These can come under DDOS attack
More laggy than CDN
CDN - A global network of proxy servers. These server content to users closer to their locations
Rewrite your URL to point to the CDN
- Push CDNs - Receive new content when changes occur on the server
- Pull CDNs - Grab new content from the server when the user requests it
- This is a slower request
Load Balancer
- Distributes user requests among clusters of servers
- Servers contain no session info, this should be in Redis or db
Reverse Proxy
- Centralizes client requests. This sits between the client requests and the web server
Lets you have more, limit connection per client, blacklist IPs
Nginx is an example of [a] reverse proxy server
- Make your servers portable
- Compress server responses
Load Balancer vs. Reverse Proxy
- Load balancer is good for horizontal scaling with multiple servers
- Reverse proxies are good with one server
Platform Layer
- Sits between the web servers and the Database
NoSQL
Cache
Asynchronism
RPC
A client causes a procedure to execute on a remote server. RPC abstracts the method call to look exactly like what it would be on the client
RPC sends data using a binary codec implementation (protobuf, thrift, avro)