System Design Primer

System Design Interview ext: Article Link

Database

ACID

  • Atomicity - each transaction is all or nothing
  • Consistency - the database stays valid between transactions
  • Isolation - concurrent transactions has the same results as serial
  • Durability - once a transaction completes, it remins complete

Replication

Master-Slave

  • The master serves reads, the slaves only serve writes

Master-Master

Both serve reads and writes and coordinate on writes

Disadvantages:

  • Requires a load balancer
  • Either violates ACID (loosely consistent) or requires slow synchronization logic

Federation (Functional Partitioning)

  • Splitting up the database by function - ex. forums, users, products

Disadvantages

  • Complex joins
  • Complex queries over multiple dbs

Sharding

  • Split the data over multiple dbs

Disadvantages

  • Complex queries
  • Complex joins

Denormalization

  • Improve reads, hinder writes

Redundant copies of data are written in multiple tables to avoid expensive writes

This strategy works in conjunction with federation and sharding

SQL Tuning

  • Using CHAR instead of VARCHAR for fixed-length fields
  • INT for large numbers up to 232
  • DECIMAL for currency
  • Set the NOT NULL constraint wherever possible
  • Use good indices
  • Denormalize data that will be frequently joined
  • Partition hot spots

Indices

  • Represented as self-balancing B-trees

Caching

  • Whenever your application tries to read data, it should first look through the cache

Two Approaches:

  1. Cache the result of database queries

  2. (Recommended) Cache objects

    • Store the complete class in the db, or
    • Store the arrays, etc. in the db

Cache:

  • User sessions
  • Blog articles
  • user-friend relationships

Asynchronism

  • Have worker nodes that constantly check from a message queue

  • Once it's done, they send a completion message

    Anything time-consuming, do it async

Latency vs. Throughput

  • Latency - the time to perform an action
  • Throughput - the number actions / time. Ex. 120 cars per day

Availability vs. Consistency

A system can only support two of the following:

  • Consistency - Every read receives the most recent write
  • Availability - Every request receives a response
  • Partition Tolerance

Consistency Patterns

  • Weak Consistency - Reads may or may not see it
  • Eventual consistency - Reads will eventually see the write
  • Strong Consistency - Reads will see it

Availability Patterns

Fail-over - Switching reliably between backup systems

  • Active-passive - Heartbeats are sent between systems. If it's interrupted, the passive server takes over
    • Aka master-slave failover
  • Active-Active - Both servers manage traffic

Delivery of Information

DNS - Translates a URL to an IP address

  • These can come under DDOS attack

  • More laggy than CDN

    CDN - A global network of proxy servers. These server content to users closer to their locations

  • Rewrite your URL to point to the CDN

    • Push CDNs - Receive new content when changes occur on the server
    • Pull CDNs - Grab new content from the server when the user requests it
      • This is a slower request

Load Balancer

  • Distributes user requests among clusters of servers
  • Servers contain no session info, this should be in Redis or db

Reverse Proxy

  • Centralizes client requests. This sits between the client requests and the web server

Lets you have more, limit connection per client, blacklist IPs

Nginx is an example of [a] reverse proxy server

  • Make your servers portable
  • Compress server responses

Load Balancer vs. Reverse Proxy

  • Load balancer is good for horizontal scaling with multiple servers
  • Reverse proxies are good with one server

Platform Layer

  • Sits between the web servers and the Database

NoSQL

Cache

Asynchronism

RPC

A client causes a procedure to execute on a remote server. RPC abstracts the method call to look exactly like what it would be on the client

RPC sends data using a binary codec implementation (protobuf, thrift, avro)

REST