System Design
Concept — Logic — Implementation
The process of defining the architecture, components, modules, interfaces and data for a system to satisfy specified requirements.
Design Netflix
SNAKE Analysis
1. Scenario: 枚举&排序
1. Enum:
1. Register/login
2. play movie stream
3. Recommendation
2. Sort:
1. Basic need: play movie stream
1. Get the channel
2. Get the movie in the channel
3. Play the movie in the channel
2. Needs/Necessary: 限制
1. Inquire: DAU = 5M
2. Predict:
1. *Avg Concurrent Users = DAU * QPS / 86400
2. Avg Concurrent Users = DAU * Avg Online Time / 86400 = 5M * 30min * 60s / 86400s = 104000
3. Peak Concurrent Users = Avg * 6 = 625000
4. Peak Concurrent Users after 3 months = Peak * 2 = 1250000
5. Throughput per user = 3Mbps (assumption)
6. Throughput after 3 months = 1250000 * 3Mbps = 3.75Tbps
7. Memory per user = 10KB
8. Memory in total after 3 months = 5M * 2 * 10KB = 100GB
9. Storage for movie: 1000TB using Amazon Cloud Service
3. Application/Kilobyte 重访&组合/添加&选择
1. Receptionist
2. Services:
1. User Service — Accounts — MySQL
2. Channel Service — Channel List — MongoDB
3. Movie Service — Movies — Files
4. Evolve 分析&回溯
1. Direction:
1. better: more users
2. broader: transaction
3. deeper: CDN?
2. Angel:
1. performance: algorithm
2. scalability: more users, more machines
3. robustness: when server breakdown?
Design Recommendation Module:
1. Scenario: class Recommender {}
2. Necessary: 2000 QPS
3. Application/Kilobytes: 倒排索引 u1: m1, m3, m7 —> m1: u1, u3 — 50 QPS; 算法改进提升QPS的性能
4. Evolve:
1. Preparer — raw data/index — recommenders — dispatcher x 40 (2000QPS/50QPS); 用50个dispatchers满足QPS的需求
2. Feed cache — (Dispatcher) — Data Manager
3. Loggers
Design User System
SNAKE Analysis
1. Scenario
1. Register/Update/Remove
2. Login/Logout
3. Balance/Membership
2. Necessary — Register
1. Ask:
1. Total users: 100M
2. DAU: 1M
2. Predict:
1. DAU in three months: 1M * 2 = 2M
2. Register percentage: 1% 新用户
3. Daily new registered users = 2M * 1% = 10K
4. Login percentage: 15% 用户日登录率
5. Average login time: 1.2 验证码输错、重复设备登录
6. Daily login time: 2M * 15% * 1.2 = 360K
7. Average login frequency: 360K / 86400s = 4.2/s
8. Avg login frequency: 4.2/s * 10 = 42/s 晚间登录高峰 10倍左右
3. Application
1. Receptionist
2. Account Service
1. User model
1. userId 节省空间 primary key 便于比较,检测
2. username
3. password (hidden)
4. state (active/inactive/banned)
2. table<User> — CRUD 增选改删
3. verification & ban — State Lifecycle Graph
1. Register — verified ? approved — (active) : Rejected — (inactive)
2. (active) — ban — (banned) — Unban — (active)
3. (active) — Deactivate — (Inactive) — Activate — (active)
4. 对于删除或注销账户,为了保证数据一致性,一般不支持移除账户的操作、或对用户名修改为『已注销』
5. logout automatically 对于银行账户:15min无操作则自动登出
6. find userId: userId = 21? userId within [4, 20]?
7. login on multiple devices
3. Session: a conversation between a user and a server
1. sessionId: data sync
2. 每个用户有一个sessionList 但这样比较复杂 sessionList长度不定,而且修改很频繁
3. 所以 拆解数据 以sessionId做primary key 以userId做foreign key 这是MySQL的思想
4. 最终分为UserTable和SessionTable 这就引入了inheritance 都属于class Table
4. How to find
1. find userId = 21:
1. Index with hash: O(1) time
2. Index with BST: O(logN) time, O(N) space
2. find userId in a range: range query
1. Index with BST: O(logN+k), k is the length of range
2. B+ Tree: O(logN) with base 2 —> O(logN) with base B, B can be 8, 32, 1024…
5. Evolve: How to support payment on membership?
1. class Membership {int userId, time endTime, double money, addMoney(), buyMembership()}
2. Problem
1. transaction & log 钱和会员时限同步更新,并在log记录,如果出问题,可以通过log的记录recover
2. ACID principle
1. atomicity: all or nothing (lock and log)
2. consistency: valid according to all defined rules (checker)
3. isolation: independency between transactions (lock or CIS)
4. durability: stored permanently (backup: master and two slavery machines)
Design Rate Limiter
Requirement
防止爬虫;例如,Kafka中上层producer产生太快限制其他producer的资源。
基本限制 QPS limit = 5
四种算法
1. 隔离算法 —
Make sure the gap between two requests >= 1s/5 = 0.2s
1. 充分满足限制;
2. 但是会误杀很多合理的request,例如0.2 0.4 0.6 0.8 0.9中的0.9
Acquire()
Time now = Time.getCurrentSecond();
if (now - lastReq <= 0.2) return false;
else {
lastReq = now;
return true;
}
2. 吊桶算法 — time-bucket
Acquire()
Time now = Time.getCurrentSecond();
if (counter[now] >= 5) return false;
else {
counter[now]++;
return true;
}
Time-bucket with Database: O(1) space
Acquire()
Time now = Time.getCurrentSecond();
counter = DB.get(now);
if (counter != null && counter >= 5) return false;
else {
DB.increase(s, 1);
DB.expire(s, 1);
return true;
}
Time-bucket: one bucket O(1) space
Acquire()
Time curSecond = Time.getCurrentSecond();
if (curSecond != preSecond) {
counter = 0;
preSecond = curSecond;
}
if (counter >= 5) return false;
else {
counter++;
return true;
}
Bad Case: 假设0 - 1s正好5个request,但是0.5s - 1.5s中出现了6个request,bucket无法检测任何一个1s区间内的准确性
3. Solution: 队列算法 — Algorithm of Request List
Acquire()
curSecond = getCurrentSecond();
preFifthSecond = requestList.get(requestList.size()-5);
if (curSecond-preFifthSecond < 1) return false; //indicating this is the 6th request in this second
else {
requestList.add(curSecond);
return true;
}
还是有问题:如果一秒有很多request,比如说one million, 都要记录在request list吗?
Nah...
用一个fixed size(5)的request list + 轮询:只记录最近的5个request
Acquire()
curSecond = getCurrentSecond();
if (curSecond-requestList.get(index) < 1) return false;
else {
requestList.set(index, curSecond);
index = (index+1)%5;
return true;
}
Follow up 1: how to save space with 10^9 queries per hour?
Batch queries. 损失精度换取空间复杂度。
对邻近时间的query打包,每个包里有10^6个query,这样,就只需要在request list中存储10^3个query/hr,也就是每3.6秒存储一个。
Follow up 2: how to support multiple threads?
Lock.
Follow up 3: how to support limiter on users?
<userId, requestList> 每个用户有自己的request list
Follow up 4: how to support query with different quotas? 不同配额呢
acquire(quota)
4. 令牌算法 — Token bucket
每0.2s产生一个令牌,没有使用的话则累积,供并发请求调用,时间为O(1)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。