System Design Guide

Beginner Guide to System Design¶

Characteristics of Distributed System¶

Scalability, Reliability, Availability, Efficiency, and Manageability.

Design a large system

different architectural pieces can be used
how these pieces work with each other
how to best utilize these pieces, what are trade-offs

Scalability

capability of a system, process, or network to grow and manage increased demand.
- increased data vol, increased work amount, increased work load...
- ensure performance
horizontal scaling: adding more servers into pool of resources. easier and dynamical
vertical scaling: adding more power (CPU, RAM, Storage, etc.) to existing servers. requires downtime and having an upper limit

Reliability

probability a system will fail in a given period. keeping delivering services even when some components fail.
achieves thru redundancy, both software and data
comes at a cost on removing single point of failure.

Availability

the time/percentage a system remains operational to perform required function.
- when taking down, or failed, considered unavailable during that time
Reliability is availability over time
availability - not necessarily reliable. reliable -> available

Efficiency

Two standard measures of efficiency:
- response time (latency), delay to obtain resp
- throughput (bandwidth), numb of delivery in a given time unit

Serviceability or Manageability

how to operate and maintain
- ease of diagnosing problems, ease of updating, how to operate, etc.
- auto failure detection can decrease or avoid system downtime

CAP Theorem¶

It is impossible for distributed system to simultaneously provide more than two of: Consistency, Availability, and Partition tolerance.

Consistency: all nodes see the same data at the same time
- achieved by updating nodes before further reads
Availability: every request gets a response on success/failure; system continues function even with node failures
- by replicating data across servers
Partition tolerance: system continues to work despite message loss or partial failure.
- survive network failures that doesn't fail entire network

Caching¶

locality of ref principle: recently request data likely accessed again
can be at all levels in an architecture, mostly found close to front-end

App Server cache

place local storage on a request layer node
quickly return local cached data if exists, else query db
for the case of LB, use global caches or distributed caches

Distributed cache

cache is divided using consistent hashing function, so a node can quickly know where to look for data existence in a dist cache.
advantage: ease of adding nodes to expand cache space
disadvantage: missing node; sol multiple copies of data on diff nodes

Global cache

all nodes use single cache space. need a server dedicated for this. no good as # of requests inc
two forms, on cache miss either: server fetch from DB, or request nodes fetch from DB

Content Distribution Network (CDN)

for sites serving large amounts of static media
server does query on miss
can use Nginx (a light w HTTP server) for your app

Cache Invalidation

data modified in DB should be invalidated in cache:
Write-through cache: data written into cache & DB at same time. (higher latency, write 2ice)
Write-around cache: data written to DB only. When accessed, must query DB
Write-back cache: data written to cache first, then write back to DB in an interval/cond. risk of data loss in crash of cache node

Cache eviction policies

First In First Out
Last In First Out
Least Recently Used
Most Recently Used
Least Frequently Used
Random Replacement

Indexes¶

well-known DB indexing, improves queries performance
- index on a table makes it faster to search through the table and find the rows desired
provides sorted list of data that is easily searchable by relevant info
- index is a data structure that points to location where data lives
disadvantage: slower to add rows or updates/delete rows, because the need to update the index.
- better read performance, worse write performance
- unnecessary indexes should be avoided/removed

Consistent Hashing¶

Distributed Hash Table (DHT): key, value, hash function
- index = hash_function(key)

'key % n' hashing approach

drawbacks:
not horizontally scalability. new cache host adding to system requires re-hash all existing mappings
may not be load balanced, especially non-uniformly distributed data (some caches busy while others idle)

Consistent Hashing definition

allows distributing data across a cluster and minimize reorganization when nodes are added/removed
- easier scale up/down caching system
when hash table resized, only k/n keys need to be remapped (k = total keys, n = total numb of servers)
Given a list of cache servers, hash them to integers in the range
map a key to a server - Hash it to a single integer - Move clockwise on the ring until finding the first cache it encounters - That cache is the one contains the key
add a cache server on the ring, causing the portion of keys previously mapping to next server map to this new server, and removed from the next server
removing a cache server on the ring, causing the keys mapping to this one mapping to next server on the ring
Virtual replicas of the servers on the ring, enables more evenly distributed keys to each server, and remains distributed when servers being added/removed from the ring

Long-Polling vs. Web Sockets vs. Server-Sent Events¶

all are popular and common protocols between client and web server.

Ajax Polling

client repeatedly polls a server for data.
client opens a connection & request data from server
request page sends requests to server at regular intervals
server calculate resp and sends back
client repeated 1-3 periodically to get updates from server
problem: many requests could be empty data and creating HTTP overhead

HTTP Long-Polling

aka. "Hanging GET", client requests expected that server may not resp immediately
- server holds client request if no data available, then sends it when it becomes available.
- client sends new request immediately follows receiving the resp, to make sure server can always push updated data to client immediately when its available.
- Long-Polling request can be timeout so new request has to be sent when previous one timeouts.

Web Sockets

use handshake to establish persistent connection so both can exchange data in both direction any time.
two-way ongoing conversation can take place b/w two machines.

Server-Sent Events (SSEs)

use a persistent long-term connection b/w client and server. Server can send data to client any time, but not client to server, which requires separate Http connection.
best when need real-traffic from server to client or when server is generating data in a loop to be sent to client.

Queues¶

manage requests in large-scale distributed system. high performance requires different components work asynchronously thru queues.
provide some protection from service outages and failures. retry service requests that failed
implemented on asynchronous common protocol, client get Acknowledgement when request received, serves as a reference for the results of work when client requires it.
open source tools like (RabbitMQ, ZeroMQ, Active MQ, & BeanstalkD)

Proxy¶

intermediary piece of hardware/software b/w client & back-end server.
- receives requests, relays to servers
typically used to filter requests, log requests, transform requests (add/remove headers, encryption/decryption, compression).
cache can serve lots of requests.
coordinating requests, optimize request traffic. (ex. collapse similar data access into one "collapsed forwarding")
particularly useful when under high load or have limited caching available (batch several requests into one)

Load Balancing (LB)¶

spread traffic across a cluster of servers
- improves responsiveness & availability of app, web, or db
keeps track of resource status
- stop distribute to failed server immediately
sits between client & server, balancing app requests
- reduces individual server load, prevents single pt of failure
- detects bottlenecks before they happen

Three places for LB

b/w user & web servers
b/w web servers & internal platform layer (ex. app servers or cache servers)
b/w internal platform layer and db

LB algorithms

Prerequisite: Health Checks: keep list of alive and healthy server. remove unresponsive servers
- Least Connection: fewest active connections
- Least Response Time: fewest active connections & lowest avg resp time
- Least bandwidth: serving least amount of traffic in Mbps
- Round Robin: cycles thru list of servers, good for equal spec servers & non-persist connections
- Weighted Round Robin: each server assigned a weight (processing capacity), higher weight get new connections first and more connections
- IP Hash: calculate hash of client IP to assign server

Redundant LB

multi-LB setup, each monitors health of the others, one active, one passive(s)
- in case of active LB failure, another one takes over and becomes active one

Ways to implement LB

Smart Clients:
- client take a pool of service hosts and balances load across them
- detect failed hosts and recovered hosts, adding new hosts
Hardware LB:
- most expensive, high performance (ex. Citrix NetScaler)
Software LB:
- (ex. HAProxy)

SQL vs. NoSQL¶

SQL

Relational DB has predefined schemas, structured
stores data in rows and columns, each row about one entity, columns being data points

NoSQL

Non-relational DB are unstructured, distributed, and have dynamic schema.
Key-Value Stores:
- data stored in an array of key-value pairs (ex. Redis, Voldemort, Dynamo)
Document Databases:
- data stored in documents, & docs are grouped together in collections
- each doc can have different structure (ex. CouchDB, MongoDB)
Wide-Column Databases:
- column families, containers for rows.
- no need to know all columns up front, each row unnecessary same numb of columns.
- good for aanalyzing large daatasets (ex. Cassandra, HBase)
Graph Databases:
- data relations represented in a graph
- nodes (entities), properties (entity info), & lines (entities connections). (ex. Neo4J, InfiniteGraph)

High level differences

Storage approach
Schema, fixed/dynamic
Querying
Scalability, SQL(vertical); NoSQL(horizontal)
Reliability or ACID Compliance (Atomicity, Consistency, Isolation, Durability). SQL still better at data reliability and transactions performance, and NoSQL sacrifice ACID compliance for performance and scalability

When to use which

SQL:
need to ensure ACID, which reduces anomalies and protects db integrity
data is structured and unchanging. no massive growth expected
NoSQL:
prevent data from being bottleneck by distributing DBs
large volumes of data storage having little or no structure, no need to define type in advance
use of cloud computing and storage for scaling up
rapid development, little prep ahead of time, little downtime b/w versions

Redundancy and Replication¶

duplicate critical data or services to increase reliability
remove single pt of failure, fail-overs when one instance down
shared-nothing architecture, where each node can operate independent of one another
- new servers can be added without conditions

Sharding (Data Partitioning)¶

break up big DB into smaller parts
- improve manageability, performance, availability, & LB
- after certain scale pt, cheaper to scale horizontally (more same machines) than grow vertically (upgrade server gears)

Partitioning Methods

horizontal partitioning:
- range based sharding. best for evenly distributed ranges
- disadvantage: can lead to unbalanced servers for bad range selection
vertical partitioning:
- divide into tables for data related to a specific feature to their own server
- disadvantage: when app grows, may need to partition feature specific DB across various servers
dictionary based partitioning:
- loosely coupled approach
- lookup service which knows partitioning scheme, holds mapping between each tuple key to its DB server

Partitioning Criteria

Key or Hash-based partitioning:
- apply a hash function to some key attribute of the entries
- need redistribution of data when adding new DB servers
List partitioning:
- each partition assigned list of values
- look up partition contains our key and store in its list
Round-robin partitioning:
- ensures uniform data dist. i mod n
Composite partitioning:
- combine above schemes.
- a consistent hashing is hash and list: hash reduces key space to a smaller size

Common Problems of Sharding

operations across multiple tables or multiple rows in the same table, no longer run on the same server
Joins and Denormalization: joins is inefficient across multiple servers. Need to denormalize the database so that previous operation requires joins can perform from a single table
Referential integrity: foreign keys in sharded db can be difficult. need to enforce it in application code
Rebalancing: when distribution is no longer uniform, or lots of load on one shard, either have to create more shards, or rebalance existing shards.
these are cost of using sharding. using directory based partitioning can make rebalancing easier at the cost of increasing system complexity and creating new pt of failure.

System Design Interviews, how to approach¶

Scoping the problem
- ask questions to understand constraints and use cases
Sketching up an abstract design
- illustrate building blocks and relationships b/w them
Identifying & addressing bottlenecks
no absolute answers, open-ended

As a candidate

learn from existing systems
- prepare ahead, learn based on real-life products, issues, & challenges
- foster analytical ability and questioning on the problem
lead the conversation
- communicate your idea to interviewer, your thought process, what you are considering
solve by breaking it down
- top-down, modularize into modules, tackle each independently
deal with bottlenecks
- talk about possible solutions to these, and trade-offs and their impacts on the sys.
try understand interviewer's intention and direction