Design Netflix

SNAKE Analysis

1. Scenario: 枚举&排序

1. Enum: 
    1. Register/login
    2. play movie stream
    3. Recommendation
2. Sort:
    1. Basic need: play movie stream
        1. Get the channel
        2. Get the movie in the channel
        3. Play the movie in the channel

2. Needs/Necessary: 限制

1. Inquire: DAU = 5M
2. Predict: 
    1. *Avg Concurrent Users = DAU * QPS / 86400
    2. Avg Concurrent Users = DAU * Avg Online Time / 86400 = 5M * 30min * 60s / 86400s = 104000
    3. Peak Concurrent Users = Avg * 6 = 625000
    4. Peak Concurrent Users after 3 months = Peak * 2 = 1250000
    5. Throughput per user = 3Mbps (assumption)
    6. Throughput after 3 months = 1250000 * 3Mbps = 3.75Tbps
    7. Memory per user = 10KB
    8. Memory in total after 3 months = 5M * 2 * 10KB = 100GB
    9. Storage for movie: 1000TB using Amazon Cloud Service

3. Application/Kilobyte 重访&组合/添加&选择

1. Receptionist
2. Services: 
    1. User Service — Accounts — MySQL
    2. Channel Service — Channel List — MongoDB
    3. Movie Service — Movies — Files

4. Evolve 分析&回溯

1. Direction: 
    1. better: more users
    2. broader: transaction
    3. deeper: CDN?
2. Angel:
    1. performance: algorithm
    2. scalability: more users, more machines
    3. robustness: when server breakdown?

Design Recommendation Module:

1. Scenario: class Recommender {}
2. Necessary: 2000 QPS
3. Application/Kilobytes: 倒排索引 u1: m1, m3, m7 —> m1: u1, u3 — 50 QPS; 算法改进提升QPS的性能
4. Evolve: 
    1. Preparer — raw data/index — recommenders — dispatcher x 40 (2000QPS/50QPS); 用50个dispatchers满足QPS的需求
    2. Feed cache — (Dispatcher) — Data Manager
    3. Loggers

Design User System

SNAKE Analysis

1. Scenario
    1. Register/Update/Remove
    2. Login/Logout
    3. Balance/Membership
2. Necessary — Register
    1. Ask: 
        1. Total users: 100M
        2. DAU: 1M
    2. Predict: 
        1. DAU in three months: 1M * 2 = 2M
        2. Register percentage: 1% 新用户
        3. Daily new registered users = 2M * 1% = 10K
        4. Login percentage: 15% 用户日登录率
        5. Average login time: 1.2 验证码输错、重复设备登录
        6. Daily login time: 2M * 15% * 1.2 = 360K
        7. Average login frequency: 360K / 86400s = 4.2/s
        8. Avg login frequency: 4.2/s * 10 = 42/s 晚间登录高峰 10倍左右
3. Application
    1. Receptionist
    2. Account Service
        1. User model
            1. userId 节省空间 primary key 便于比较,检测
            2. username
            3. password (hidden)
            4. state (active/inactive/banned)
        2. table<User> — CRUD 增选改删
        3. verification & ban — State Lifecycle Graph
            1. Register — verified ? approved — (active) : Rejected — (inactive)
            2. (active) — ban — (banned) — Unban — (active)
            3. (active) — Deactivate — (Inactive) — Activate — (active)
        4. 对于删除或注销账户,为了保证数据一致性,一般不支持移除账户的操作、或对用户名修改为『已注销』
        5. logout automatically 对于银行账户:15min无操作则自动登出
        6. find userId: userId = 21? userId within [4, 20]?
        7. login on multiple devices
    3. Session: a conversation between a user and a server
        1. sessionId: data sync
        2. 每个用户有一个sessionList 但这样比较复杂 sessionList长度不定,而且修改很频繁
        3. 所以 拆解数据 以sessionId做primary key 以userId做foreign key 这是MySQL的思想
        4. 最终分为UserTable和SessionTable 这就引入了inheritance 都属于class Table
    4. How to find
        1. find userId = 21: 
            1. Index with hash: O(1) time
            2. Index with BST: O(logN) time, O(N) space
        2. find userId in a range: range query
            1. Index with BST: O(logN+k), k is the length of range
            2. B+ Tree: O(logN) with base 2 —> O(logN) with base B, B can be 8, 32, 1024…
    5. Evolve: How to support payment on membership?
        1. class Membership {int userId, time endTime, double money, addMoney(), buyMembership()}
        2. Problem
            1. transaction & log 钱和会员时限同步更新,并在log记录,如果出问题,可以通过log的记录recover
            2. ACID principle
                1. atomicity: all or nothing (lock and log)
                2. consistency: valid according to all defined rules (checker)
                3. isolation: independency between transactions (lock or CIS)
                4. durability: stored permanently (backup: master and two slavery machines)

Design Rate Limiter


基本限制 QPS limit = 5


1. 隔离算法 —

Make sure the gap between two requests >= 1s/5 = 0.2s

1. 充分满足限制;
2. 但是会误杀很多合理的request,例如0.2 0.4 0.6 0.8 0.9中的0.9 
Time now = Time.getCurrentSecond();
if (now - lastReq <= 0.2) return false;
else {
    lastReq = now;
    return true;

2. 吊桶算法 — time-bucket

Time now = Time.getCurrentSecond();
if (counter[now] >= 5) return false;
else {
    return true;

Time-bucket with Database: O(1) space

Time now = Time.getCurrentSecond();
counter = DB.get(now);
if (counter != null && counter >= 5) return false;
else {
    DB.increase(s, 1);
    DB.expire(s, 1);
    return true;

Time-bucket: one bucket O(1) space

Time curSecond = Time.getCurrentSecond();
if (curSecond != preSecond) {
    counter = 0;
    preSecond = curSecond;
if (counter >= 5) return false;
else {
    return true;

Bad Case: 假设0 - 1s正好5个request,但是0.5s - 1.5s中出现了6个request,bucket无法检测任何一个1s区间内的准确性

3. Solution: 队列算法 — Algorithm of Request List

curSecond = getCurrentSecond();
preFifthSecond = requestList.get(requestList.size()-5);
if (curSecond-preFifthSecond < 1) return false; //indicating this is the 6th request in this second
else {
    return true;

还是有问题:如果一秒有很多request,比如说one million, 都要记录在request list吗?
用一个fixed size(5)的request list + 轮询:只记录最近的5个request

curSecond = getCurrentSecond();
if (curSecond-requestList.get(index) < 1) return false;
else {
    requestList.set(index, curSecond);
    index = (index+1)%5;
    return true;

Follow up 1: how to save space with 10^9 queries per hour?

Batch queries. 损失精度换取空间复杂度。
对邻近时间的query打包,每个包里有10^6个query,这样,就只需要在request list中存储10^3个query/hr,也就是每3.6秒存储一个。

Follow up 2: how to support multiple threads?


Follow up 3: how to support limiter on users?

<userId, requestList> 每个用户有自己的request list

Follow up 4: how to support query with different quotas? 不同配额呢


4. 令牌算法 — Token bucket


