头图


So far in this series, we have learned about Resilience4j and its Retry , RateLimiter , TimeLimiter2161a9b9759fd2d and In this article, we will explore the CircuitBreaker module. We will learn when and how to use it, and look at some examples.

Code example

This article attached GitHub on working code examples.

What is Resilience4j?

See the description of the previous article, a quick overview of general works Resilience4j of .

What is a circuit breaker?

The idea of the circuit breaker is to block the call to the remote service if we know that the call may fail or time out. We do this in order not to waste critical resources unnecessarily in our services and remote services. This exit also gives the remote service some time to recover.

How do we know that a call may fail? By tracking the results of previous requests to remote services. For example, if 8 of the first 10 calls resulted in a failure or timeout, the next call may also fail.

The circuit breaker tracks the response by wrapping the call to the remote service. During normal operation, when the remote service responds successfully, we say that the circuit breaker is in the "closed" state. When in the closed state, the circuit breaker normally passes the request to the remote service.

When the remote service returns an error or timeout, the circuit breaker will increase an internal counter. If the error count exceeds the configured threshold, the circuit breaker will switch to the "open" state. When in the open state, the circuit breaker immediately returns an error to the caller, without even trying to make a remote call.

After a configured period of time, the circuit breaker switches from the open state to the "half open" state. In this state, it allows some requests to be passed to the remote service to check if it is still unavailable or slow. If the error rate or slow call rate is higher than the configured threshold, switch back to the disconnected state. However, if the error rate or slow call rate is lower than the configured threshold, switch to the off state to resume normal operation.

Type of circuit breaker

Circuit breakers can be count-based or time-based. If the last N calls fail or are slow, the count-based circuit breaker switches the state from closed to open. If the response in the last N seconds fails or is slow, the time-based circuit breaker will switch to the open state. In these two circuit breakers, we can also specify the threshold for failure or slow calls.

For example, if 70% of the last 25 calls failed or took more than 2 seconds to complete, we can configure a count-based circuit breaker to "open the circuit." Similarly, if 80% of the calls in the past 30 seconds failed or took more than 5 seconds, we can tell the time-based circuit breaker to open the circuit.

CircuitBreaker concept of Resilience4j

resilience4j-circuitbreaker is similar to other Resilience4j modules. We provide the code that we want to execute as a function construct — a lambda expression for remote invocation or a Supplier of a certain value retrieved from a remote service, etc. — and the circuit breaker modifies it with code, tracking the response if necessary And switch state.

Resilience4j supports both count-based and time-based circuit breakers.

We use the slidingWindowType() configuration to specify the type of circuit breaker. This configuration can take one of two values −
SlidingWindowType.COUNT_BASED or
SlidingWindowType.TIME_BASED

failureRateThreshold() and slowCallRateThreshold() configure the failure rate threshold and slow call rate in percentage form.

slowCallDurationThreshold() configures the time in seconds that the call is considered slow.

We can specify a minimumNumberOfCalls() , which is needed before the circuit breaker can calculate the error rate or slow call rate.

As mentioned earlier, the circuit breaker switches from the disconnected state to the semi-disconnected state after a certain period of time to check the status of the remote service. waitDurationInOpenState() specifies how long the circuit breaker should wait before switching to the half-open state.

permittedNumberOfCallsInHalfOpenState() configures the number of calls allowed in the half-open state,
maxWaitDurationInHalfOpenState() determines how long the circuit breaker can remain in the half-open state before switching back to the open state.

The default value of 0 for this configuration means that the circuit breaker will wait indefinitely until all
permittedNumberOfCallsInHalfOpenState() completed.

By default, the circuit breaker treats any exception as a failure. But we can adjust this to use the recordExceptions() configuration to specify the list of exceptions that should be considered as failed and the list of exceptions that should be ignoreExceptions() configuration.

If we want finer control when determining whether an exception should be considered a failure or ignored, we can provide Predicate<Throwable> as the recordException() or ignoreException() configuration.

When the circuit breaker rejects the call in the open state, it will throw CallNotPermittedException . We can use the writablestacktraceEnabled() configuration to control the amount of information in the stack trace of CallNotPermittedException

Use Resilience4j CircuitBreaker module

Let's see how to use
Various functions available in the resilience4j-circuitbreaker

We will use the same examples as in the previous articles in this series. Suppose we are building a website for an airline to allow its customers to search and book flights. Our service talks with the remote service encapsulated in the FlightSearchService

When using Resilience4j circuit breakers, CircuitBreakerRegistry , CircuitBreakerConfig and CircuitBreaker are the main abstractions we use.

CircuitBreakerRegistry is a factory used to create and manage CircuitBreaker

CircuitBreakerConfig encapsulates all the configurations in the previous section. Each CircuitBreaker object is associated with a CircuitBreakerConfig .

The first step is to create a CircuitBreakerConfig :

CircuitBreakerConfig config = CircuitBreakerConfig.ofDefaults();

This will create a CircuitBreakerConfig with the following default values:

ConfigurationDefaults
slidingWindowTypeCOUNT_BASED
failureRateThreshold50%
slowCallRateThreshold100%
slowCallDurationThreshold60s
minimumNumberOfCalls100
permittedNumberOfCallsInHalfOpenState10
maxWaitDurationInHalfOpenState0s

Count-based circuit breaker

Suppose we want the circuit breaker to open when 70% of the last 10 calls fail:

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.COUNT_BASED)
  .slidingWindowSize(10)
  .failureRateThreshold(70.0f)
  .build();

Then we create a CircuitBreaker with this configuration:

CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
CircuitBreaker circuitBreaker = registry.circuitBreaker("flightSearchService");

Now let us express our code to run the flight search Supplier and decorate it circuitbreaker

Supplier<List<Flight>> flightsSupplier =
  () -> service.searchFlights(request);
Supplier<List<Flight>> decoratedFlightsSupplier =
  circuitBreaker.decorateSupplier(flightsSupplier);

Finally, let us call several modification operations to understand the working principle of the circuit breaker. We can use CompletableFuture to simulate concurrent flight search requests from users:

for (int i=0; i<20; i++) {
  try {
    System.out.println(decoratedFlightsSupplier.get());
  }
  catch (...) {
    // Exception handling
  }
}

The output shows that the first few flight searches were successful, and then the 7 flight searches failed. At this point, the circuit breaker opens and throws CallNotPermittedException for subsequent calls:

Searching for flights; current time = 12:01:12 884
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
Searching for flights; current time = 12:01:12 954
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
Searching for flights; current time = 12:01:12 957
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
Searching for flights; current time = 12:01:12 958
io.reflectoring.resilience4j.circuitbreaker.exceptions.FlightServiceException: Error occurred during flight search
... stack trace omitted ...
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
... other lines omitted ...
io.reflectoring.resilience4j.circuitbreaker.Examples.countBasedSlidingWindow_FailedCalls(Examples.java:56)
  at io.reflectoring.resilience4j.circuitbreaker.Examples.main(Examples.java:229)

Now, suppose we want the circuit breaker to take 2 seconds or more to complete in 70% of the last 10 calls:

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.COUNT_BASED)
  .slidingWindowSize(10)
  .slowCallRateThreshold(70.0f)
  .slowCallDurationThreshold(Duration.ofSeconds(2))
  .build();

The timestamp in the example output shows that the request always takes 2 seconds to complete. After 7 slow responses, the circuit breaker is opened and no further calls are allowed:

Searching for flights; current time = 12:26:27 901
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
Searching for flights; current time = 12:26:29 953
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
Searching for flights; current time = 12:26:31 957
Flight search successful
... other lines omitted ...
Searching for flights; current time = 12:26:43 966
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
... stack trace omitted ...
        at io.reflectoring.resilience4j.circuitbreaker.Examples.main(Examples.java:231)
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
... stack trace omitted ...
        at io.reflectoring.resilience4j.circuitbreaker.Examples.main(Examples.java:231)

Usually we will configure a circuit breaker with a failure rate and a slow call rate threshold:

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.COUNT_BASED)
  .slidingWindowSize(10)
  .failureRateThreshold(70.0f)
  .slowCallRateThreshold(70.0f)
  .slowCallDurationThreshold(Duration.ofSeconds(2))
  .build();

Time-based circuit breaker

Suppose we want the circuit breaker to open when 70% of requests fail in the past 10 seconds:

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.COUNT_BASED)
  .slidingWindowSize(10)
  .failureRateThreshold(70.0f)
  .slowCallRateThreshold(70.0f)
  .slowCallDurationThreshold(Duration.ofSeconds(2))
  .build();

We created CircuitBreaker , represented the flight search call as Supplier<List<Flight>> and decorated it with CircuitBreaker , as we did in the previous section.

The following is sample output after multiple invocations of the decoration operation:

Start time: 18:51:01 552
Searching for flights; current time = 18:51:01 582
Flight search successful
[Flight{flightNumber='XY 765', ... }]
... other lines omitted ...
Searching for flights; current time = 18:51:01 631
io.reflectoring.resilience4j.circuitbreaker.exceptions.FlightServiceException: Error occurred during flight search
... stack trace omitted ...
Searching for flights; current time = 18:51:01 632
io.reflectoring.resilience4j.circuitbreaker.exceptions.FlightServiceException: Error occurred during flight search
... stack trace omitted ...
Searching for flights; current time = 18:51:01 633
... other lines omitted ...
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
... other lines omitted ...

The first 3 requests succeeded, and the next 7 requests failed. At this time, the circuit breaker is disconnected, and subsequent requests fail CallNotPermittedException

Now, suppose we want 70% of the circuit breaker calls in the past 10 seconds to take 1 second or more to complete:

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.TIME_BASED)
  .minimumNumberOfCalls(10)
  .slidingWindowSize(10)
  .slowCallRateThreshold(70.0f)
  .slowCallDurationThreshold(Duration.ofSeconds(1))
  .build();

The timestamp in the sample output shows that the request always takes 1 second to complete. After 10 requests ( minimumNumberOfCalls ), when the circuit breaker determines that 70% of the previous requests took 1 second or more, it opens the circuit:

Start time: 19:06:37 957
Searching for flights; current time = 19:06:37 979
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 19:06:39 066
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 19:06:40 070
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 19:06:41 070
... other lines omitted ...
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
... stack trace omitted ...

Usually we will configure a time-based circuit breaker with failure rate and slow call rate thresholds:

Specify the waiting time in the disconnected state

Suppose we want to wait for 10 seconds when the circuit breaker is in the open state, then switch to the semi-open state and let some requests pass to the remote service:

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.TIME_BASED)
  .slidingWindowSize(10)
  .minimumNumberOfCalls(10)
  .failureRateThreshold(70.0f)
  .slowCallRateThreshold(70.0f)
  .slowCallDurationThreshold(Duration.ofSeconds(2))
  .build();

The time stamp in the sample output shows that the circuit breaker initially transitioned to the open state, blocked some calls for the next 10 seconds, and then changed to the semi-open state. Later, the consistent successful response in the half-open state caused it to switch to the closed state again:

Searching for flights; current time = 20:55:58 735
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 20:55:59 812
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 20:56:00 816
... other lines omitted ...
io.reflectoring.resilience4j.circuitbreaker.exceptions.FlightServiceException: Flight search failed
    at
... stack trace omitted ...
2020-12-13T20:56:03.850115+05:30: CircuitBreaker 'flightSearchService' changed state from CLOSED to OPEN
2020-12-13T20:56:04.851700+05:30: CircuitBreaker 'flightSearchService' recorded a call which was not permitted.
2020-12-13T20:56:05.852220+05:30: CircuitBreaker 'flightSearchService' recorded a call which was not permitted.
2020-12-13T20:56:06.855338+05:30: CircuitBreaker 'flightSearchService' recorded a call which was not permitted.
... other similar lines omitted ...
2020-12-13T20:56:12.862362+05:30: CircuitBreaker 'flightSearchService' recorded a call which was not permitted.
2020-12-13T20:56:13.865436+05:30: CircuitBreaker 'flightSearchService' changed state from OPEN to HALF_OPEN
Searching for flights; current time = 20:56:13 865
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
... other similar lines omitted ...
2020-12-13T20:56:16.877230+05:30: CircuitBreaker 'flightSearchService' changed state from HALF_OPEN to CLOSED
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 20:56:17 879
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
... other similar lines omitted ...

Specify fallback method

A common pattern when using a circuit breaker is to specify the fallback method to be invoked when the circuit is broken. fallback method can provide some default values or behaviors for disallowed remote calls.

We can use the Decorators utility class to set it up. Decorators is a builder from the resilience4j-all module, with withCircuitBreaker() , withRetry() , withRateLimiter() etc., which can help apply multiple Resilience4j decorators to Supplier , Function etc.

When the circuit breaker opens and throws CallNotPermittedException , we will use its withFallback() method to return the flight search results from the local cache:

Supplier<List<Flight>> flightsSupplier = () -> service.searchFlights(request);
Supplier<List<Flight>> decorated = Decorators
  .ofSupplier(flightsSupplier)
  .withCircuitBreaker(circuitBreaker)
  .withFallback(Arrays.asList(CallNotPermittedException.class),
                e -> this.getFlightSearchResultsFromCache(request))
  .decorate();

The following example output shows the search results returned from the cache after the circuit breaker is disconnected:

Searching for flights; current time = 22:08:29 735
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 22:08:29 854
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 22:08:29 855
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Searching for flights; current time = 22:08:29 855
2020-12-13T22:08:29.856277+05:30: CircuitBreaker 'flightSearchService' recorded an error: 'io.reflectoring.resilience4j.circuitbreaker.exceptions.FlightServiceException: Error occurred during flight search'. Elapsed time: 0 ms
Searching for flights; current time = 22:08:29 912
... other lines omitted ...
2020-12-13T22:08:29.926691+05:30: CircuitBreaker 'flightSearchService' changed state from CLOSED to OPEN
Returning flight search results from cache
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
Returning flight search results from cache
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... }]
... other lines omitted ...

Reduce the information in Stacktrace

Whenever the circuit breaker opens, it will throw CallNotPermittedException :

io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
    at io.github.resilience4j.circuitbreaker.CallNotPermittedException.createCallNotPermittedException(CallNotPermittedException.java:48)
... other lines in stack trace omitted ...
at io.reflectoring.resilience4j.circuitbreaker.Examples.timeBasedSlidingWindow_SlowCalls(Examples.java:169)
    at io.reflectoring.resilience4j.circuitbreaker.Examples.main(Examples.java:263)

Except for the first line, the other lines in the stack trace did not add much value. If 061a9b975a06aa occurs multiple times, these stack trace CallNotPermittedException

We can reduce the amount of information generated in the stack trace by setting the writablestacktraceEnabled() configuration to false

CircuitBreakerConfig config = CircuitBreakerConfig
  .custom()
  .slidingWindowType(SlidingWindowType.COUNT_BASED)
  .slidingWindowSize(10)
  .failureRateThreshold(70.0f)
  .writablestacktraceEnabled(false)
  .build();

Now, when CallNotPermittedException occurs, there is only one line in the stack trace:

Searching for flights; current time = 20:29:24 476
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
Searching for flights; current time = 20:29:24 540
Flight search successful
[Flight{flightNumber='XY 765', flightDate='12/31/2020', from='NYC', to='LAX'}, ... ]
... other lines omitted ...
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
io.github.resilience4j.circuitbreaker.CallNotPermittedException: CircuitBreaker 'flightSearchService' is OPEN and does not permit further calls
...

Other useful methods

Similar to the Retry module, CircuitBreaker also has ignoreExceptions() , recordExceptions() etc., allowing us to specify which exceptions should be ignored and considered when the CircuitBreaker

For example, we may not want to ignore SeatsUnavailableException from the long-range flight service-in this case, we really don't want to disconnect the circuit.

Similar to other Resilience4j modules we have seen, CircuitBreaker also provides additional methods, such as decorateCheckedSupplier() , decorateCompletionStage() , decorateRunnable() , decorateConsumer() etc., so we can provide codes in other structures besides Supplier

Circuit breaker event

CircuitBreaker has a EventPublisher can generate the following types of events:

  • CircuitBreakerOnSuccessEvent,
  • CircuitBreakerOnErrorEvent,
  • CircuitBreakerOnStateTransitionEvent,
  • CircuitBreakerOnResetEvent,
  • CircuitBreakerOnIgnoredErrorEvent,
  • CircuitBreakerOnCallNotPermittedEvent,
  • CircuitBreakerOnFailureRateExceededEvent and
  • CircuitBreakerOnSlowCallRateExceededEvent.

We can listen to these events and record them, for example:

circuitBreaker.getEventPublisher()
  .onCallNotPermitted(e -> System.out.println(e.toString()));
circuitBreaker.getEventPublisher()
  .onError(e -> System.out.println(e.toString()));
circuitBreaker.getEventPublisher()
  .onFailureRateExceeded(e -> System.out.println(e.toString()));
circuitBreaker.getEventPublisher().onStateTransition(e -> System.out.println(e.toString()));

The following is a sample log output:

2020-12-13T22:25:52.972943+05:30: CircuitBreaker 'flightSearchService' recorded an error: 'io.reflectoring.resilience4j.circuitbreaker.exceptions.FlightServiceException: Error occurred during flight search'. Elapsed time: 0 ms
Searching for flights; current time = 22:25:52 973
... other lines omitted ...
2020-12-13T22:25:52.974448+05:30: CircuitBreaker 'flightSearchService' exceeded failure rate threshold. Current failure rate: 70.0
2020-12-13T22:25:52.984300+05:30: CircuitBreaker 'flightSearchService' changed state from CLOSED to OPEN
2020-12-13T22:25:52.985057+05:30: CircuitBreaker 'flightSearchService' recorded a call which was not permitted.
... other lines omitted ...

CircuitBreaker indicator

CircuitBreake exposes many indicators, these are some important items:

  • The total number of successful, failed, or ignored calls ( resilience4j.circuitbreaker.calls )
  • Circuit breaker status ( resilience4j.circuitbreaker.state )
  • Breaker failure rate ( resilience4j.circuitbreaker.failure.rate )
  • The total number of calls not allowed ( resilience4.circuitbreaker.not.permitted.calls )
  • Slow call of circuit breaker ( resilience4j.circuitbreaker.slow.call.rate )

First, we create CircuitBreakerConfig , CircuitBreakerRegistry and CircuitBreaker as usual. Then, we create a MeterRegistry and CircuitBreakerRegistry to it:

MeterRegistry meterRegistry = new SimpleMeterRegistry();
TaggedCircuitBreakerMetrics.ofCircuitBreakerRegistry(registry)
  .bindTo(meterRegistry);

After running a few breaker modification operations, we display the captured metrics. This is some sample output:

The number of slow failed calls which were slower than a certain threshold - resilience4j.circuitbreaker.slow.calls: 0.0
The states of the circuit breaker - resilience4j.circuitbreaker.state: 0.0, state: metrics_only
Total number of not permitted calls - resilience4j.circuitbreakernot.permitted.calls: 0.0
The slow call of the circuit breaker - resilience4j.circuitbreaker.slow.call.rate: -1.0
The states of the circuit breaker - resilience4j.circuitbreaker.state: 0.0, state: half_open
Total number of successful calls - resilience4j.circuitbreaker.calls: 0.0, kind: successful
The failure rate of the circuit breaker - resilience4j.circuitbreaker.failure.rate: -1.0

In practical applications, we will regularly export the data to the monitoring system and analyze it on the dashboard.

in conclusion

In this article, we learned how to use Resilience4j's Circuitbreaker module to suspend the request to the remote service when it returns an error. We understand why this is important, and we saw some practical examples on how to configure it.

You can use on GitHub code to demonstrate a complete application.


This article is translated from: Implementing a Circuit Breaker with Resilience4j-Reflectoring


信码由缰
65 声望8 粉丝

“码”界老兵,分享程序人生。