1

When the concurrent traffic of the system is too large, it may cause the system to be overwhelmed and the entire service to be unavailable.

For this scenario, the general solution is: if this traffic is exceeded, we refuse to provide services, so that our services will not hang.

Of course, although the current limit can protect the system from being overwhelmed, it will be very unhappy for the users who are limited by the current. So current limiting is actually a lossy solution. But lossy service is the best solution to being unavailable at all

tmall

The role of current limiting

In addition to the current-limiting usage scenarios mentioned above, the current-limiting design can also prevent malicious request traffic and malicious attacks

Therefore, the basic principle of current limiting is to protect the system by limiting the rate of concurrent access/requests or requests within a time window. Once the rate limit is reached, service can be denied (directed to an error page or informed that the resource is gone) , queue or wait (seckill, order), downgrade (return to bottom data or default data or default data, such as product details page stock is available by default)

Common current limiters in Internet companies include: limiting the total number of concurrent connections (such as database connection pools and thread pools), limiting the number of instantaneous concurrency (limit_conn module of nginx, used to limit the number of instantaneous concurrent connections), and limiting the average rate within the time window ( For example, Guava's RateLimiter and nginx's limit_req module limit the average rate per second); others also limit the call rate of remote interfaces and limit the consumption rate of MQ. In addition, you can limit the current based on the number of network connections, network traffic, CPU or memory load, etc.

With the cache, you can add a current limit, so you can deal with high concurrency. You don't have to worry about the instantaneous traffic causing the system to hang or avalanche. In the end, the service will be damaged instead of no service. However, the current limit needs to be evaluated well. Use it indiscriminately, otherwise some normal traffic will have some strange problems, resulting in poor user experience and user loss.

Common current limiting algorithms

sliding window

Both sender and receiver maintain a sequence of data frames, which is called a window. The sender's window size is determined by the receiver. The purpose is to control the sending speed, so that the receiver's buffer is not large enough to cause overflow. At the same time, controlling the flow can also avoid network congestion. The data frames 4, 5, and 6 in the figure below have been sent, but the associated ACK has not been received, and frames 7, 8, and 9 are waiting to be sent. It can be seen that the window size of the sender is 6, which is informed by the receiver. At this time, if the sender receives ACK No. 4, the left edge of the window shrinks to the right, and the right edge of the window expands to the right. At this time, the window "slides" forward, that is, data frame 10 can also be sent.

1564909772103

Sliding window demo address

Leaky bucket (control transmission rate Leaky bucket)

The idea of the leaky bucket algorithm is to continuously pour water into the bucket, no matter how fast or small the water injection speed is, the water will leak out at a fixed rate; if the bucket is full, the water will overflow;

The bucket itself has a constant rate of water leaking downwards, and water will enter the bucket at times faster and slower. When the bucket is not full, the water above can be added. Once the water is full, the water above cannot be added. Bucket full is a key trigger condition in the algorithm (that is, the condition for the abnormal flow judgment to be established). Under these conditions, there are two ways to deal with the water flowing down from above.

After the bucket is full of water, two common treatment methods are:

  1. Temporarily block the downward flow of the water above, and wait for part of the water in the bucket to leak before releasing the water above.
  2. The water above the overflow is thrown away directly.

Features

  1. The rate of water leakage is fixed
  2. The rate of water leakage is fixed even if there is a burst of water injection (a sudden increase in water injection volume)

    image-20200917200938881

Token Bucket (capable of solving burst traffic)

The token bucket algorithm is one of the most commonly used algorithms in network traffic shaping (Traffic Shaping) and rate limiting (Rate Limiting). Typically, the token bucket algorithm is used to control the amount of data sent to the network and to allow bursts of data to be sent.

The token bucket is a bucket that stores fixed-capacity tokens (tokens), and tokens are added to the bucket at a fixed rate; the token bucket algorithm actually consists of three parts: two streams and one bucket, which are the token stream, Data flow and token bucket

Token Stream and Token Bucket

The system will generate tokens at a certain speed and place them in the token bucket. Think of the token bucket as a buffer (which can be implemented with a data structure such as a queue). When the buffer is full, Newly generated tokens are thrown away. Two variables are important here:

The first is the rate at which tokens are generated, commonly referred to as rate. For example, we set rate = 2, which means that 2 tokens are generated every second, which means that one token is generated every 1/2 second;

The second is the size of the token bucket, commonly known as burst. For example, we set burst = 10, that is, the token bucket can only hold a maximum of 10 tokens.

data flow

The data stream is the real traffic entering the system. For the http interface, if it is called 2 times per second on average, the rate is considered to be 2 times/s.

There are three possible scenarios:

The rate of the data flow is equal to the rate of the token flow . In this case, each incoming packet or request can correspond to a token, and then pass through the queue without delay;

The rate of the data flow is less than the rate of the token flow . The packets or requests passing through the queue consume only a part of the tokens, and the remaining tokens will accumulate in the token bucket until the bucket is full. The remaining tokens can be consumed in burst requests.

The rate of the data flow is greater than the rate of the token flow . This means that the tokens in the bucket are quickly depleted. The service is interrupted for a period of time. If packets or requests continue to arrive, packet loss or refusal to respond will occur.

For example, in the previous example, the rate of token generation and the size of the token bucket are rate = 2 and burst = 10 respectively, then the system can withstand a burst request rate of 10 times/s, and the average request rate is 2 times/s . The last of the three scenarios is the core of this algorithm, which is very accurate, very simple to implement, and has negligible pressure on the server, so it is widely used and worth learning and utilizing

image-20200917201022871

Features

  1. Tokens can be accumulated: the maximum number of tokens in the bucket is b, which means the maximum number of tokens that can be accumulated
  2. Allow burst traffic: the tokens in the bucket can be accumulated to n (b<=n<=0). At this time, if n burst requests arrive at the same time, these n requests can be allowed to be processed at the same time

Current limiting algorithm in practice

Semaphore

Semaphore is more commonly used for current limiting operations. For example, in the following scenario, 20 client requests are simulated. In order to reduce the access pressure, we limit the requested traffic through Semaphore.

 public class SemaphoreTest {

    public static void main(String[] args) {  
        // 线程池 
        ExecutorService exec = Executors.newCachedThreadPool();  
        // 只能5个线程同时访问 
        final Semaphore semp = new Semaphore(5);  
        // 模拟20个客户端访问 
        for (int index = 0; index < 20; index++) {
            final int NO = index;  
            Runnable run = new Runnable() {  
                public void run() {  
                    try {  
                        // 获取许可 
                        semp.acquire();  
                        System.out.println("Accessing: " \+ NO);  
                        Thread.sleep((long) (Math.random() * 10000));  
                        // 访问完后,释放 
                        semp.release();  
                    } catch (InterruptedException e) {  
                    }  
                }  
            };  
            exec.execute(run);  
        }  
        // 退出线程池 
        exec.shutdown();  
    }  
}

Guava's RateLimiter implementation

There are two implementations of RateLimiter in Guava: Bursty and WarmUp

bursty is an algorithm implementation based on token buckets, such as

RateLimiter rateLimiter=RateLimiter.create(permitPerSecond); //Create a bursty instance

rateLimiter.acquire(); //Get 1 permit, when the number of tokens is not enough, it will block until it is acquired

  1. import jar package

     <dependency>
       <groupId>com.google.guava</groupId>
       <artifactId>guava</artifactId>
       <version>23.0</version>
    </dependency>
  2. write test code

     public class PayService {
    
        RateLimiter rateLimiter=RateLimiter.create(10);//qps=10
    
        public void doRequest(String threadName){
            if(rateLimiter.tryAcquire()){
                System.out.println(threadName+": 支付成功");
            }else{
                System.out.println(threadName+": 当前支付人数过多,请稍候再试");
            }
        }
    
        public static void main(String[] args) throws IOException {
            PayService payService=new PayService();
            CountDownLatch latch=new CountDownLatch(1);
            Random random=new Random(10);
            for (int i = 0; i < 20; i++) {
                int finalI = i;
                new Thread(()->{
                    try {
                        latch.await();
                        int sleepTime = random.nextInt(1000);
                        Thread.sleep(sleepTime);
                        payService.doRequest("t-"+ finalI);
                    }catch (Exception e){
                        e.printStackTrace();
                    }
                }).start();
            }
            latch.countDown();
            System.in.read();
        }
    }
The next article will analyze Sentinel, Alibaba's open source current limiting framework!
Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source for Mic带你学架构 !
If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!

跟着Mic学架构
810 声望1.1k 粉丝

《Spring Cloud Alibaba 微服务原理与实战》、《Java并发编程深度理解及实战》作者。 咕泡教育联合创始人,12年开发架构经验,对分布式微服务、高并发领域有非常丰富的实战经验。