In the previous article in this series, we learned about Resilience4j and how to use its Retry module . Now let us understand RateLimiter-what it is, when and how to use it, and what to pay attention to when implementing rate limiting (or also known as "throttling").
Code example
This article is accompanied by a working code example on GitHub.
What is Resilience4j?
See the description of the previous article, a quick overview of general works Resilience4j of .
What is the speed limit?
We can look at rate limiting from two perspectives-as a service provider and as a service consumer.
Server speed limit
As a service provider, we implement rate limiting to protect our resources from overload and denial of service (DoS) attacks .
In order to meet our service level agreements (SLA) with all consumers, we want to ensure that a consumer that causes a surge in traffic will not affect the quality of our services to others.
We do this by setting limits on how many requests consumers are allowed to make in a given time unit. We reject any request that exceeds the limit with an appropriate response, such as HTTP status 429 (Too many requests). This is called server-side rate limiting.
Rate limits are specified in requests per second (rps), requests per minute (rpm), or similar. Some services have multiple rate limits at different durations (for example, 50 rpm and no more than 2500 rph) and at different times of the day (for example, 100 rps during the day and 150 rps at night). This restriction may apply to a single user (identified by a user ID, IP address, API access key, etc.) or a tenant in a multi-tenant application.
Client rate limit
as a service consumer, we want to ensure that we do not overload the service provider with . In addition, we don't want to incur unexpected costs-both in terms of money and service quality.
This will happen if the services we consume are elastic. The service provider may not limit our requests, but may charge us an additional fee for the additional load. Some even ban misbehaving customers for a short period of time. The rate limiting implemented by consumers to prevent such problems is called client rate limiting.
When to use RateLimiter?
resilience4j-ratelimiter used for client rate limit.
Server-side rate limiting requires things like caching and coordination between multiple server instances, which is not supported by resilience4j. For server-side rate limiting, there are API gateways and API filters, such as Kong API Gateway and Repose API Filter . The RateLimiter module of Resilience4j is not intended to replace them.
Resilience4j RateLimiter concept
The thread that wants to call the remote service first requests permission from RateLimiter. If RateLimiter allows, the thread continues. Otherwise, RateLimiter will park the thread or put it in a waiting state.
RateLimiter periodically creates new permissions. When the permission is available, the thread will be notified and can continue.
The number of calls allowed in a period of time is called limitForPeriod
. The frequency of RateLimiter refresh permissions is specified limitRefreshPeriod
timeoutDuration
specifies how long a thread can wait to obtain permission. If there are no available permissions at the end of the waiting time, RateLimiter will throw a runtime exception RequestNotPermitted
Use the Resilience4j RateLimiter module
RateLimiterRegistry
, RateLimiterConfig
and RateLimiter
are the main abstractions resilience4j-ratelimiter
RateLimiterRegistry
is a factory used to create and manage RateLimiter
objects.
RateLimiterConfig
encapsulates the limitForPeriod
, limitRefreshPeriod
and timeoutDuration
configurations. Each RateLimiter
object is associated with a RateLimiterConfig
.
RateLimiter
provides helper methods to create decorators for functional interfaces or lambda expressions containing remote calls.
Let's see how to use the various functions available in the RateLimiter module. Suppose we are building a website for an airline to allow its customers to search and book flights. Our service talks to the remote service encapsulated by the FlightSearchService
Basic example
The first step is to create a RateLimiterConfig
:
RateLimiterConfig config = RateLimiterConfig.ofDefaults();
This will create a RateLimiterConfig
with default values of limitForPeriod
(50), limitRefreshPeriod
(500ns) and timeoutDuration
(5s).
Suppose our contract with airline services stipulates that we can call their search API at 1 rps. Then we will create RateLimiterConfig
like this:
RateLimiterConfig config = RateLimiterConfig.custom()
.limitForPeriod(1)
.limitRefreshPeriod(Duration.ofSeconds(1))
.timeoutDuration(Duration.ofSeconds(1))
.build();
If the thread cannot timeoutDuration
, an error will occur.
Then we create a RateLimiter
and decorate the searchFlights()
call:
RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter limiter = registry.rateLimiter("flightSearchService");
// FlightSearchService and SearchRequest creation omitted
Supplier<List<Flight>> flightsSupplier =
RateLimiter.decorateSupplier(limiter,
() -> service.searchFlights(request));
Finally, we used the decorated Supplier<List<Flight>>
many times:
for (int i=0; i<3; i++) {
System.out.println(flightsSupplier.get());
}
The timestamp in the sample output shows that one request is made every second:
Searching for flights; current time = 15:29:40 786
...
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 15:29:41 791
...
[Flight{flightNumber='XY 765', ... }, ... ]
If the limit is exceeded, we will receive the RequestNotPermitted
exception:
Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(RequestNotPermitted.java:43)
at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(RateLimiter.java:580)
... other lines omitted ...
The decoration method throws a checked exception
Suppose we are callingFlightSearchService.searchFlightsThrowingException()
, it can throw a checked Exception
. Then we can't useRateLimiter.decorateSupplier()
. We will useRateLimiter.decorateCheckedSupplier()
instead:
CheckedFunction0<List<Flight>> flights =
RateLimiter.decorateCheckedSupplier(limiter,
() -> service.searchFlightsThrowingException(request));
try {
System.out.println(flights.apply());
} catch (...) {
// exception handling
}
RateLimiter.decorateCheckedSupplier()
returns a CheckedFunction0
, which represents a function with no parameters. Please note the apply()
CheckedFunction0
object to invoke the remote operation.
If we don't want to use Suppliers
, RateLimiter
provides more auxiliary decorator methods, such as decorateFunction()
, decorateCheckedFunction()
, decorateRunnable()
, decorateCallable()
etc., for use with other language structures. decorateChecked*
method is used to decorate methods that throw checked exceptions.
Apply multiple rate limits
Assume that the airline's flight search has multiple rate limits: 2 rps and 40 rpm. We can apply multiple restrictions on the client side by creating multiple RateLimiters
RateLimiterConfig rpsConfig = RateLimiterConfig.custom().
limitForPeriod(2).
limitRefreshPeriod(Duration.ofSeconds(1)).
timeoutDuration(Duration.ofMillis(2000)).build();
RateLimiterConfig rpmConfig = RateLimiterConfig.custom().
limitForPeriod(40).
limitRefreshPeriod(Duration.ofMinutes(1)).
timeoutDuration(Duration.ofMillis(2000)).build();
RateLimiterRegistry registry = RateLimiterRegistry.of(rpsConfig);
RateLimiter rpsLimiter =
registry.rateLimiter("flightSearchService_rps", rpsConfig);
RateLimiter rpmLimiter =
registry.rateLimiter("flightSearchService_rpm", rpmConfig);
然后我们使用两个 RateLimiters 装饰 searchFlights() 方法:
Supplier<List<Flight>> rpsLimitedSupplier =
RateLimiter.decorateSupplier(rpsLimiter,
() -> service.searchFlights(request));
Supplier<List<Flight>> flightsSupplier
= RateLimiter.decorateSupplier(rpmLimiter, rpsLimitedSupplier);
The sample output shows that 2 requests are made per second, and the limit is 40 requests:
Searching for flights; current time = 15:13:21 246
...
Searching for flights; current time = 15:13:21 249
...
Searching for flights; current time = 15:13:22 212
...
Searching for flights; current time = 15:13:40 215
...
Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted:
RateLimiter 'flightSearchService_rpm' does not permit further calls
at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(RequestNotPermitted.java:43)
at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(RateLimiter.java:580)
Change limits at runtime
If necessary, we can change the values of limitForPeriod
and timeoutDuration
limiter.changeLimitForPeriod(2);
limiter.changeTimeoutDuration(Duration.ofSeconds(2));
For example, if our rate limit changes based on the time of day, this feature is useful-we can have a scheduled thread to change these values. The new value does not affect threads that are currently waiting for permission.
Use RateLimiter and Retry together
Suppose we want RequestNotPermitted
exception because it is a temporary error. We will create R ateLimiter
and Retry
objects as usual. Then we decorate a Supplier
suppliers and with Retry
packing it:
Supplier<List<Flight>> rateLimitedFlightsSupplier =
RateLimiter.decorateSupplier(rateLimiter,
() -> service.searchFlights(request));
Supplier<List<Flight>> retryingFlightsSupplier =
Retry.decorateSupplier(retry, rateLimitedFlightsSupplier);
The sample output shows RequestNotPermitted
abnormal retry request:
Searching for flights; current time = 15:29:39 847
Flight search successful
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 17:10:09 218
...
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]
2020-07-27T17:10:09.484: Retry 'rateLimitedFlightSearch', waiting PT1S until attempt '1'. Last attempt failed with exception 'io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls'.
Searching for flights; current time = 17:10:10 492
...
2020-07-27T17:10:10.494: Retry 'rateLimitedFlightSearch' recorded a successful retry attempt...
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]
The order in which we create the decorators is important . If we Retry
and RateLimiter
together, it will not work.
RateLimiter event
RateLimiter
has a EventPublisher
RateLimiterOnSuccessEvent
and RateLimiterOnFailureEvent
types of events when calling remote operations to indicate whether the permission is successful. We can monitor these events and record them, for example:
RateLimiter limiter = registry.rateLimiter("flightSearchService");
limiter.getEventPublisher().onSuccess(e -> System.out.println(e.toString()));
limiter.getEventPublisher().onFailure(e -> System.out.println(e.toString()));
An example of log output is as follows:
RateLimiterEvent{type=SUCCESSFUL_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.127+05:30}
... other lines omitted ...
RateLimiterEvent{type=FAILED_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.186+05:30}
RateLimiter indicator
Suppose that after implementing client-side throttling, we find that the response time of the API has increased. This is possible-as we have seen, if the permissions are not available when the thread calls the remote operation, RateLimiter
will put the thread in a waiting state.
If our request processing thread often waits for permission, it may mean that our limitForPeriod
is too low. Maybe we need to work with our service provider and get additional quota first.
Monitoring the RateLimiter
indicator can help us identify such capacity problems and ensure that RateLimiterConfig
is working well.
RateLimiter
tracks two indicators: the number of available permissions (resilience4j.ratelimiter.available.permissions
) and the number of threads waiting for permission (resilience4j.ratelimiter.waiting.threads
)。
First, we create RateLimiterConfig
, RateLimiterRegistry
and RateLimiter
as usual. Then, we create a MeterRegistry
and RateLimiterRegistry
to it:
MeterRegistry meterRegistry = new SimpleMeterRegistry();
TaggedRateLimiterMetrics.ofRateLimiterRegistry(registry)
.bindTo(meterRegistry);
After running several speed limit operations, we display the captured metrics:
Consumer<Meter> meterConsumer = meter -> {
String desc = meter.getId().getDescription();
String metricName = meter.getId().getName();
Double metricValue = StreamSupport.stream(meter.measure().spliterator(), false)
.filter(m -> m.getStatistic().name().equals("VALUE"))
.findFirst()
.map(m -> m.getValue())
.orElse(0.0);
System.out.println(desc + " - " + metricName + ": " + metricValue);};meterRegistry.forEachMeter(meterConsumer);
This is some sample output:
The number of available permissions - resilience4j.ratelimiter.available.permissions: -6.0
The number of waiting threads - resilience4j.ratelimiter.waiting_threads: 7.0
resilience4j.ratelimiter.available.permissions
shows the number of permissions reserved by the requesting thread. In practical applications, we regularly export data to the monitoring system and analyze it on the dashboard.
Pitfalls and good practices when implementing client rate limiting
Make the rate limiter a singleton
All calls to a given remote service should go through the same RateLimiter
instance. For a given remote service, RateLimiter
must be a singleton.
If we do not enforce this operation, some areas of our code base may bypass RateLimiter
directly call remote services. To prevent this, actual calls to remote services should use the rate-limiting decorator exposed by the internal layer in the core, internal layer, and other areas.
How do we ensure that future new developers understand this intent? Check out Tom's article, which reveals a way to solve this type of problem, which is to clarify such intentions by organizing the package structure. In addition, it shows how to enforce this by coding intents in ArchUnit tests.
Configure rate limiter for multiple server instances
Finding the correct value for the configuration can be tricky. If we run multiple service instances in the cluster, limitForPeriod
must take this into account .
For example, if the upstream service has a rate limit of 100 rps and our service has 4 instances, then we will configure 25 rps as the limit for each instance.
However, this assumes that the load on each of our instances is roughly the same. If this is not the case, or if our service itself is resilient and the number of instances may be different, then Resilience4j's RateLimiter
may not be suitable for .
In this case, we need a rate limiter to store its data in a distributed cache instead of in memory RateLimiter
But this will affect the response time of our service. Another option is to implement some kind of adaptive rate limiting. Although Resilience4j may support , it is not clear when it will be available.
Choose the right timeout
For the timeoutDuration
configuration value, we should keep in mind the expected response time of the API .
If we set timeoutDuration too high, response time and throughput will be affected. If it is too low, our error rate may increase.
Because here it may involve some trial and error, and therefore a good practice us in RateLimiterConfig
value in use (such as timeoutDuration
, limitForPeriod
and limitRefreshPeriod
) maintained as a configuration other than the service we . Then we can change them without changing the code.
Tuning the client-side and server-side rate limiter
implements client rate limiting and cannot guarantees that we will never be subject to the rate limit of the upstream service .
Suppose we have a limit of 2 rps from upstream services, and we limitForPeriod
to 2 and limitRefreshPeriod
to 1s. If we make two requests in the last few milliseconds of the second second, and no other calls before that, RateLimiter
will allow them. If we make two more calls within the first few milliseconds of the next second, RateLimiter
will also allow them because there are two new permissions available. However, the upstream service may reject these two requests because the server usually implements rate limiting based on sliding windows.
In order to ensure that we will never get an excess rate from the upstream service, we need to configure the fixed window in the client to be shorter than the sliding window in the service. Therefore, if we limitForPeriod
to 1 and limitRefreshPeriod
to 500ms in the previous example, we will not get the error of exceeding the rate limit. However, all three requests after the first request will wait, which increases response time and reduces throughput.
in conclusion
In this article, we learned how to use Resilience4j's RateLimiter module to implement client rate limiting. We studied different ways to configure it through practical examples. We learned some good practices and caveats to keep in mind when implementing rate limiting.
You can use GitHub to demonstrate a complete application to illustrate these ideas.
This article is translated from: Implementing Rate Limiting with Resilience4j-Reflectoring
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。