Use Resilience4j in a Java project to implement a rate-limiting/throttling mechanism for client API calls

In the previous article in this series, we learned about Resilience4j and how to use its Retry module . Now let us understand RateLimiter-what it is, when and how to use it, and what to pay attention to when implementing rate limiting (or also known as "throttling").

Code example

This article is accompanied by a working code example on GitHub.

What is Resilience4j?

See the description of the previous article, a quick overview of general works Resilience4j of .

What is the speed limit?

We can look at rate limiting from two perspectives-as a service provider and as a service consumer.

Server speed limit

As a service provider, we implement rate limiting to protect our resources from overload and denial of service (DoS) attacks .

In order to meet our service level agreements (SLA) with all consumers, we want to ensure that a consumer that causes a surge in traffic will not affect the quality of our services to others.

We do this by setting limits on how many requests consumers are allowed to make in a given time unit. We reject any request that exceeds the limit with an appropriate response, such as HTTP status 429 (Too many requests). This is called server-side rate limiting.

Rate limits are specified in requests per second (rps), requests per minute (rpm), or similar. Some services have multiple rate limits at different durations (for example, 50 rpm and no more than 2500 rph) and at different times of the day (for example, 100 rps during the day and 150 rps at night). This restriction may apply to a single user (identified by a user ID, IP address, API access key, etc.) or a tenant in a multi-tenant application.

Client rate limit

as a service consumer, we want to ensure that we do not overload the service provider with . In addition, we don't want to incur unexpected costs-both in terms of money and service quality.

This will happen if the services we consume are elastic. The service provider may not limit our requests, but may charge us an additional fee for the additional load. Some even ban misbehaving customers for a short period of time. The rate limiting implemented by consumers to prevent such problems is called client rate limiting.

When to use RateLimiter?

resilience4j-ratelimiter used for client rate limit.

Server-side rate limiting requires things like caching and coordination between multiple server instances, which is not supported by resilience4j. For server-side rate limiting, there are API gateways and API filters, such as Kong API Gateway and Repose API Filter . The RateLimiter module of Resilience4j is not intended to replace them.

Resilience4j RateLimiter concept

The thread that wants to call the remote service first requests permission from RateLimiter. If RateLimiter allows, the thread continues. Otherwise, RateLimiter will park the thread or put it in a waiting state.

RateLimiter periodically creates new permissions. When the permission is available, the thread will be notified and can continue.

The number of calls allowed in a period of time is called limitForPeriod . The frequency of RateLimiter refresh permissions is specified limitRefreshPeriod timeoutDuration specifies how long a thread can wait to obtain permission. If there are no available permissions at the end of the waiting time, RateLimiter will throw a runtime exception RequestNotPermitted

Use the Resilience4j RateLimiter module

RateLimiterRegistry , RateLimiterConfig and RateLimiter are the main abstractions resilience4j-ratelimiter

RateLimiterRegistry is a factory used to create and manage RateLimiter objects.

RateLimiterConfig encapsulates the limitForPeriod , limitRefreshPeriod and timeoutDuration configurations. Each RateLimiter object is associated with a RateLimiterConfig .

RateLimiter provides helper methods to create decorators for functional interfaces or lambda expressions containing remote calls.

Let's see how to use the various functions available in the RateLimiter module. Suppose we are building a website for an airline to allow its customers to search and book flights. Our service talks to the remote service encapsulated by the FlightSearchService

Basic example

The first step is to create a RateLimiterConfig :

RateLimiterConfig config = RateLimiterConfig.ofDefaults();

This will create a RateLimiterConfig with default values of limitForPeriod (50), limitRefreshPeriod (500ns) and timeoutDuration (5s).

Suppose our contract with airline services stipulates that we can call their search API at 1 rps. Then we will create RateLimiterConfig like this:

RateLimiterConfig config = RateLimiterConfig.custom()
  .limitForPeriod(1)
  .limitRefreshPeriod(Duration.ofSeconds(1))
  .timeoutDuration(Duration.ofSeconds(1))
  .build();

If the thread cannot timeoutDuration , an error will occur.

Then we create a RateLimiter and decorate the searchFlights() call:

RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter limiter = registry.rateLimiter("flightSearchService");
// FlightSearchService and SearchRequest creation omitted
Supplier<List<Flight>> flightsSupplier =
  RateLimiter.decorateSupplier(limiter,
    () -> service.searchFlights(request));

Finally, we used the decorated Supplier<List<Flight>> many times:

for (int i=0; i<3; i++) {
  System.out.println(flightsSupplier.get());
}

The timestamp in the sample output shows that one request is made every second:

Searching for flights; current time = 15:29:40 786
...
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 15:29:41 791
...
[Flight{flightNumber='XY 765', ... }, ... ]

If the limit is exceeded, we will receive the RequestNotPermitted exception:

Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(RequestNotPermitted.java:43)

 at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(RateLimiter.java:580)

... other lines omitted ...

The decoration method throws a checked exception

Suppose we are calling
FlightSearchService.searchFlightsThrowingException() , it can throw a checked Exception . Then we can't use
RateLimiter.decorateSupplier() . We will use
RateLimiter.decorateCheckedSupplier() instead:

CheckedFunction0<List<Flight>> flights =
  RateLimiter.decorateCheckedSupplier(limiter,
    () -> service.searchFlightsThrowingException(request));


try {
  System.out.println(flights.apply());
} catch (...) {
  // exception handling
}

RateLimiter.decorateCheckedSupplier() returns a CheckedFunction0 , which represents a function with no parameters. Please note the apply() CheckedFunction0 object to invoke the remote operation.

If we don't want to use Suppliers , RateLimiter provides more auxiliary decorator methods, such as decorateFunction() , decorateCheckedFunction() , decorateRunnable() , decorateCallable() etc., for use with other language structures. decorateChecked* method is used to decorate methods that throw checked exceptions.

Apply multiple rate limits

Assume that the airline's flight search has multiple rate limits: 2 rps and 40 rpm. We can apply multiple restrictions on the client side by creating multiple RateLimiters

RateLimiterConfig rpsConfig = RateLimiterConfig.custom().
  limitForPeriod(2).
  limitRefreshPeriod(Duration.ofSeconds(1)).
  timeoutDuration(Duration.ofMillis(2000)).build();


RateLimiterConfig rpmConfig = RateLimiterConfig.custom().
  limitForPeriod(40).
  limitRefreshPeriod(Duration.ofMinutes(1)).
  timeoutDuration(Duration.ofMillis(2000)).build();


RateLimiterRegistry registry = RateLimiterRegistry.of(rpsConfig);
RateLimiter rpsLimiter =
  registry.rateLimiter("flightSearchService_rps", rpsConfig);
RateLimiter rpmLimiter =
  registry.rateLimiter("flightSearchService_rpm", rpmConfig);  
然后我们使用两个 RateLimiters 装饰 searchFlights() 方法：

Supplier<List<Flight>> rpsLimitedSupplier =
  RateLimiter.decorateSupplier(rpsLimiter,
    () -> service.searchFlights(request));


Supplier<List<Flight>> flightsSupplier
  = RateLimiter.decorateSupplier(rpmLimiter, rpsLimitedSupplier);

The sample output shows that 2 requests are made per second, and the limit is 40 requests:

Searching for flights; current time = 15:13:21 246
...
Searching for flights; current time = 15:13:21 249
...
Searching for flights; current time = 15:13:22 212
...
Searching for flights; current time = 15:13:40 215
...
Exception in thread "main" io.github.resilience4j.ratelimiter.RequestNotPermitted:
RateLimiter 'flightSearchService_rpm' does not permit further calls
at io.github.resilience4j.ratelimiter.RequestNotPermitted.createRequestNotPermitted(RequestNotPermitted.java:43)
at io.github.resilience4j.ratelimiter.RateLimiter.waitForPermission(RateLimiter.java:580)

Change limits at runtime

If necessary, we can change the values of limitForPeriod and timeoutDuration

limiter.changeLimitForPeriod(2);
limiter.changeTimeoutDuration(Duration.ofSeconds(2));

For example, if our rate limit changes based on the time of day, this feature is useful-we can have a scheduled thread to change these values. The new value does not affect threads that are currently waiting for permission.

Use RateLimiter and Retry together

Suppose we want RequestNotPermitted exception because it is a temporary error. We will create R ateLimiter and Retry objects as usual. Then we decorate a Supplier suppliers and with Retry packing it:

Supplier<List<Flight>> rateLimitedFlightsSupplier =
  RateLimiter.decorateSupplier(rateLimiter,
    () -> service.searchFlights(request));


Supplier<List<Flight>> retryingFlightsSupplier =
  Retry.decorateSupplier(retry, rateLimitedFlightsSupplier);

The sample output shows RequestNotPermitted abnormal retry request:

Searching for flights; current time = 15:29:39 847
Flight search successful
[Flight{flightNumber='XY 765', ... }, ... ]
Searching for flights; current time = 17:10:09 218
...
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]
2020-07-27T17:10:09.484: Retry 'rateLimitedFlightSearch', waiting PT1S until attempt '1'. Last attempt failed with exception 'io.github.resilience4j.ratelimiter.RequestNotPermitted: RateLimiter 'flightSearchService' does not permit further calls'.
Searching for flights; current time = 17:10:10 492
...
2020-07-27T17:10:10.494: Retry 'rateLimitedFlightSearch' recorded a successful retry attempt...
[Flight{flightNumber='XY 765', flightDate='07/31/2020', from='NYC', to='LAX'}, ...]

The order in which we create the decorators is important . If we Retry and RateLimiter together, it will not work.

RateLimiter event

RateLimiter has a EventPublisher RateLimiterOnSuccessEvent and RateLimiterOnFailureEvent types of events when calling remote operations to indicate whether the permission is successful. We can monitor these events and record them, for example:

RateLimiter limiter = registry.rateLimiter("flightSearchService");
limiter.getEventPublisher().onSuccess(e -> System.out.println(e.toString()));
limiter.getEventPublisher().onFailure(e -> System.out.println(e.toString()));

An example of log output is as follows:

RateLimiterEvent{type=SUCCESSFUL_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.127+05:30}
... other lines omitted ...
RateLimiterEvent{type=FAILED_ACQUIRE, rateLimiterName='flightSearchService', creationTime=2020-07-21T19:14:33.186+05:30}

RateLimiter indicator

Suppose that after implementing client-side throttling, we find that the response time of the API has increased. This is possible-as we have seen, if the permissions are not available when the thread calls the remote operation, RateLimiter will put the thread in a waiting state.

If our request processing thread often waits for permission, it may mean that our limitForPeriod is too low. Maybe we need to work with our service provider and get additional quota first.

Monitoring the RateLimiter indicator can help us identify such capacity problems and ensure that RateLimiterConfig is working well.

RateLimiter tracks two indicators: the number of available permissions (
resilience4j.ratelimiter.available.permissions ) and the number of threads waiting for permission (
resilience4j.ratelimiter.waiting.threads）。

First, we create RateLimiterConfig , RateLimiterRegistry and RateLimiter as usual. Then, we create a MeterRegistry and RateLimiterRegistry to it:

MeterRegistry meterRegistry = new SimpleMeterRegistry();
TaggedRateLimiterMetrics.ofRateLimiterRegistry(registry)
  .bindTo(meterRegistry);

After running several speed limit operations, we display the captured metrics:

Consumer<Meter> meterConsumer = meter -> {
  String desc = meter.getId().getDescription();
  String metricName = meter.getId().getName();
  Double metricValue = StreamSupport.stream(meter.measure().spliterator(), false)
    .filter(m -> m.getStatistic().name().equals("VALUE"))
    .findFirst()
    .map(m -> m.getValue())
    .orElse(0.0);
  System.out.println(desc + " - " + metricName + ": " + metricValue);};meterRegistry.forEachMeter(meterConsumer);

This is some sample output:

The number of available permissions - resilience4j.ratelimiter.available.permissions: -6.0
The number of waiting threads - resilience4j.ratelimiter.waiting_threads: 7.0

resilience4j.ratelimiter.available.permissions shows the number of permissions reserved by the requesting thread. In practical applications, we regularly export data to the monitoring system and analyze it on the dashboard.

Pitfalls and good practices when implementing client rate limiting

Make the rate limiter a singleton

All calls to a given remote service should go through the same RateLimiter instance. For a given remote service, RateLimiter must be a singleton.

If we do not enforce this operation, some areas of our code base may bypass RateLimiter directly call remote services. To prevent this, actual calls to remote services should use the rate-limiting decorator exposed by the internal layer in the core, internal layer, and other areas.

How do we ensure that future new developers understand this intent? Check out Tom's article, which reveals a way to solve this type of problem, which is to clarify such intentions by organizing the package structure. In addition, it shows how to enforce this by coding intents in ArchUnit tests.

Configure rate limiter for multiple server instances

Finding the correct value for the configuration can be tricky. If we run multiple service instances in the cluster, limitForPeriod must take this into account .

For example, if the upstream service has a rate limit of 100 rps and our service has 4 instances, then we will configure 25 rps as the limit for each instance.

However, this assumes that the load on each of our instances is roughly the same. If this is not the case, or if our service itself is resilient and the number of instances may be different, then Resilience4j's RateLimiter may not be suitable for .

In this case, we need a rate limiter to store its data in a distributed cache instead of in memory RateLimiter But this will affect the response time of our service. Another option is to implement some kind of adaptive rate limiting. Although Resilience4j may support , it is not clear when it will be available.

Choose the right timeout

For the timeoutDuration configuration value, we should keep in mind the expected response time of the API .

If we set timeoutDuration too high, response time and throughput will be affected. If it is too low, our error rate may increase.

Because here it may involve some trial and error, and therefore a good practice us in RateLimiterConfig value in use (such as timeoutDuration , limitForPeriod and limitRefreshPeriod ) maintained as a configuration other than the service we . Then we can change them without changing the code.

Tuning the client-side and server-side rate limiter

implements client rate limiting and cannot guarantees that we will never be subject to the rate limit of the upstream service .

Suppose we have a limit of 2 rps from upstream services, and we limitForPeriod to 2 and limitRefreshPeriod to 1s. If we make two requests in the last few milliseconds of the second second, and no other calls before that, RateLimiter will allow them. If we make two more calls within the first few milliseconds of the next second, RateLimiter will also allow them because there are two new permissions available. However, the upstream service may reject these two requests because the server usually implements rate limiting based on sliding windows.

In order to ensure that we will never get an excess rate from the upstream service, we need to configure the fixed window in the client to be shorter than the sliding window in the service. Therefore, if we limitForPeriod to 1 and limitRefreshPeriod to 500ms in the previous example, we will not get the error of exceeding the rate limit. However, all three requests after the first request will wait, which increases response time and reduces throughput.

in conclusion

In this article, we learned how to use Resilience4j's RateLimiter module to implement client rate limiting. We studied different ways to configure it through practical examples. We learned some good practices and caveats to keep in mind when implementing rate limiting.

You can use GitHub to demonstrate a complete application to illustrate these ideas.

This article is translated from: Implementing Rate Limiting with Resilience4j-Reflectoring