ESA Stack (Elastic Service Architecture) is a technology brand incubated by OPPO Cloud Computing Center, dedicated to microservice-related technology stacks to help users quickly build high-performance, high-availability cloud-native microservices. Products include high-performance web service framework, RPC framework, service governance framework, registration center, configuration center, call chain tracking system, Service Mesh, Serverless and other products and research directions.
At present, some products have been open sourced to the outside world:
Open source master: https://www.esastack.io
Github: https://github.com/esastack
RestClient project address: https://github.com/esastack/esa-restclient
RestClient document address: https://www.esastack.io/esa-restclient
ESA RestClient
ESA RestClient is a Netty-based full-link asynchronous event-driven high-performance lightweight HTTP client.
hereinafter referred to as RestClient
1、Quick Start
Step1: Add dependencies
<dependency>
<groupId>io.esastack</groupId>
<artifactId>restclient</artifactId>
<version>1.0.0</version>
</dependency>
Step2: Build RestClient and send request processing response
final RestClient client = RestClient.ofDefault(); //快速创建RestClient,各项配置均为默认配置。
//如果用户想自定义一些配置,则可以使用RestClient.create()来进行自定义配置。
client.post("http://127.0.0.1:8081/")
.entity("Hello Server") //设置请求体
.execute() //执行请求逻辑
.thenAccept((response)-> { //异步处理响应
try {
System.out.println(response.bodyToEntity(String.class)); //调用response.bodyToEntity(Class TargetClass)来 Decode 响应,
//TargetClass为期望的响应类型
} catch (Exception e) {
e.printStackTrace();
}
});
2. Features
- Http1/H2/H2cUpgrade/Https
- Encode and EncodeAdvice
- Decode and DecodeAdvice
- RestInterceptor
- Sending large files
- request level read timeout
- Request level retry
- request level redirection
- 100-expect-continue
- Multipart
- Metrics
- more …
2.1 Encode and EncodeAdvice
2.1.1 Encode
RestClient will automatically select the appropriate Encoder for Encoding according to the user's Headers and Entity. It has the following Encoders built in:
Json
jackson (default)
fastjson
gson
- ProtoBuf
- File
- String
- byte[]
In addition, RestClient also supports user-defined Encoder.
2.1.1.1 Using Json Encoder
Specifying the contentType as MediaType.APPLICATION_JSON will automatically use the Json Encoder to encode the Entity. An example is as follows:
final RestClient client = RestCient.ofDefault();
client.post("localhost:8080/path")
.contentType(MediaTpe.APPLICATION_JSON)
.entity(new Person("Bob","male"))
.execute();
2.1.1.2 Using ProtoBuf Encoder
When the specified contentType is ProtoBufCodec.PROTO_BUF and the Entity type is a subclass of com.google.protobuf.Message, the ProtoBuf Encoder will be automatically used to encode the Entity. An example is as follows:
final RestClient client = RestClient.ofDefault();
client.post("localhost:8080/path")
.contentType(ProtoBufCodec.PROTO_BUF)
.entity(message)
.execute();
2.1.1.3 Using File Encoder
When the Entity type is File, the File Encoder will be automatically used to encode the Entity. An example is as follows:
final RestClient client = RestClient.ofDefault();
client.post("localhost:8080/path")
.entity(new File("tem"))
.execute();
2.1.1.4 Custom Encoder
When the built-in Encoder of RestClient cannot meet the user's needs, the user can customize the Encoder. Examples are as follows:
public class StringEncoder implements ByteEncoder {
@Override
public RequestContent<byte[]> doEncode(EncodeContext<byte[]> ctx) {
if (MediaType.TEXT_PLAIN.equals(ctx.contentType())) {
if (ctx.entity() != null) {
return RequestContent.of(((String) ctx.entity()).getBytes(StandardCharsets.UTF_8));
} else {
return RequestContent.of("null");
}
}
//该Encoder无法Encode这种类型,将Encode工作交给下一个Encoder
return ctx.next();
}
}
Users can directly bind the custom Encoder to the request or Client, and also support users to load the Encoder through SPI. For details, please refer to the document: " Configure Encoder "
2.1.1.5 Encode execution timing
See Encoder of in Request Processing Complete Process.
2.1.2 EncodeAdvice
Users can insert business logic before and after Encode through EncodeAdvice to modify and replace the Entity to be Encoded or the RequestContent after Encode.
2.1.2.1 Example
public class EncodeAdviceImpl implements EncodeAdvice {
@Override
public RequestContent<?> aroundEncode(EncodeAdviceContext ctx) throws Exception {
//...before encode
RequestContent<?> requestContent = ctx.next();
//...after encode
return requestContent;
}
}
Users can directly bind the customized EncodeAdvice to the Client, and also support users to load EncodeAdvice through SPI. For details, please refer to the document: " Configure EncodeAdvice "
2.1.2.2 Execution timing
See Request Processing Complete Flow .
2.2 Decode and DecodeAdvice
2.2.1 Decode
RestClient will automatically select the appropriate Decoder for Decoding according to the user's Headers and expected Entity type. RestClient has the following built-in Decoders:
Json
jackson (default)
fastjson
gson
- ProtoBuf
- String
- byte[]
In addition, RestClient also supports user-defined decoders.
2.2.1.1 Using Json Decoder
When the contentType of Response is MediaType.APPLICATION_JSON, Json Decoder will be used for Decode automatically.
final RestClient client = RestClient.ofDefault();
client.get("localhost:8080/path")
.execute()
.thenAccept((response)-> {
try {
//当 MediaType.APPLICATION_JSON.equals(response.contentType()) 时将自动使用Json Decoder
System.out.println(response.bodyToEntity(Person.class));
} catch (Exception e) {
e.printStackTrace();
}
});
2.2.1.2 Using ProtoBuf Decoder
When the contentType of Response is ProtoBufCodec.PROTO_BUF, and the type passed in by response.bodyToEntity() is a subclass of com.google.protobuf.Message, ProtoBuf Decoder will be automatically used for Decoding.
final RestClient client = RestClient.ofDefault();
client.get("localhost:8080/path")
.execute()
.thenAccept((response)-> {
try {
//当 ProtoBufCodec.PROTO_BUF.equals(response.contentType()),且 Person 为 Message 的子类时,将自动使用ProtoBuf Decoder
System.out.println(response.bodyToEntity(Person.class));
} catch (Exception e) {
e.printStackTrace();
}
});
2.2.1.3 Custom Decoder
When the built-in Decoder of RestClient cannot meet the user's needs, the user can customize the Decoder. Examples are as follows:
public class StringDecoder implements ByteDecoder {
@Override
public Object doDecode(DecodeContext<byte[]> ctx) {
if (String.class.isAssignableFrom(ctx.targetType())) {
return new String(ctx.content().value());
}
return ctx.next();
}
}
Users can directly bind the custom Decoder to the request or Client, and also support users to load the Decoder through SPI. For details, please refer to the document: " Configure Decoder "
2.2.1.4 Execution timing
See Decoder in Request Processing Complete Flow .
2.2.2 DecodeAdvice
Users can insert business logic through DecodeAdvice before and after Decode to modify and replace the ResponseContent to be decoded or the object after Decode.
2.2.2.1 Example
public class DecodeAdviceImpl implements DecodeAdvice {
@Override
public Object aroundDecode(DecodeAdviceContext ctx) throws Exception {
//...before decode
Object decoded = ctx.next();
//...after decode
return decoded;
}
}
Users can directly bind the customized DecodeAdvice to the Client, and also support users to load DecodeAdvice through SPI. For details, please refer to the document: " Configure DecodeAdvice "
2.2.2.2 Execution timing
See Request Processing Complete Flow .
2.3 RestInterceptor
Users can use RestInterceptor to insert business logic before the request is sent and after the response is received. RestClient supports configuring RestInterceptor through builder configuration and SPI loading.
2.3.1 Builder configuration
Pass in a custom RestInterceptor instance when constructing RestClient, such as:
final RestClient client = RestClient.create()
.addInterceptor((request, next) -> {
System.out.println("Interceptor");
return next.proceed(request);
}).build();
2.3.2 SPI
2.3.2.1 Common SPI
RestClient supports loading the implementation class of the RestInterceptor interface through SPI. When using it, you only need to put the custom RestInterceptor into the specified directory according to the loading rules of the SPI.
2.3.2.2 RestInterceptorFactory
If the user-defined RestInterceptor has different implementations for different RestClient configurations, the user can implement the RestInterceptorFactory interface, and put the custom RestInterceptorFactory into the specified directory according to the SPI loading rules.
public interface RestInterceptorFactory {
Collection<RestInterceptor> interceptors(RestClientOptions clientOptions);
}
When RestClient is constructed, RestInterceptorFactory.interceptors(RestClientOptions clientOptions) will be called, and all RestInterceptors returned by this method will be added to the constructed RestClient.
2.3.2.3 Execution timing
See Request Processing Complete Flow .
2.4 Sending large files
When the file is small, the file can be sent by directly writing the file content to the request body. But when the file content is too large, there is an OOM risk when writing directly.
In order to solve this problem, RestClient uses NIO to send files in a zero-copy manner with the help of the underlying Netty, which avoids OOM and reduces the multiple copies of data.
Users only need a simple interface call to use this function:
final RestClient client = RestClient.ofDefault();
final String entity = client.post("http://127.0.0.1:8081/")
.entity(new File("bigFile"))
.execute();
2.5 Read Timeout
RestClient supports request-level read timeouts as well as client-level read timeouts. The default read timeout is 6000L.
2.5.1 Client-level read timeout
The client-level read timeout will take effect for all requests under the client. The specific configuration methods are as follows:
final RestClient client = RestClient.create()
.readTimeout(3000L)
.build();
2.5.2 Request level read timeout
When Request sets the read timeout, its data will override the read timeout set by the Client. The specific configuration is as follows:
final RestClient client = RestClient.ofDefault();
client.get("http://127.0.0.1:8081/")
.readTimeout(3000L)
.execute()
.thenAccept((response)-> {
try {
System.out.println(response.bodyToEntity(String.class));
} catch (Exception e) {
e.printStackTrace();
}
});
2.6 Retry
RestClient supports request-level retries as well as client-level retries.
By default, RestClient will only retry all requests that throw connection exceptions (to prevent the service of the server from being non-idempotent), where: the maximum number of retries is 3 (excluding the original request), and the retry interval is 0 . When used, you can change the number of retries, retry conditions, retry interval, etc. by customizing the RetryOptions parameters.
2.6.1 Client level retry
Client-level retry will take effect for all requests under the client. When using it, you can change the number of retries, retry conditions, and retry interval by customizing the RetryOptions parameter. The specific configuration is as follows
final RestClient client = RestClient.create()
.retryOptions(RetryOptions.options()
.maxRetries(3)
//设置每次重试的间隔时间
.intervalMs(retryCount-> (retryCount+1) * 3000L)
//判断是否要重试
.predicate((request, response, ctx, cause) -> cause != null)
.build())
.connectionPoolSize(2048)
.build();
2.6.2 Request-level retry
When Request sets the number of retries, its data will override the number of retries set by the Client. The specific configuration methods are as follows:
final RestClient client = RestClient.ofDefault();
client.get("http://127.0.0.1:8081/")
.maxRetries(3)
.execute()
.thenAccept((response)-> {
try {
System.out.println(response.bodyToEntity(String.class));
} catch (Exception e) {
e.printStackTrace();
}
});
2.7 Redirect
By default, RestClient will redirect requests with response status codes 301, 302, 303, 307, and 308, where: the maximum number of redirects is 5 (excluding the original request). When in use, you can update the number of redirects through maxRedirects or disable (maxRedirects=0) the redirect function.
2.7.1 Client setting redirection
Client-level redirection will take effect for all requests under the client. The specific configuration is as follows:
final RestClient client = RestClient.create()
.maxRedirects(3)
.build();
2.7.2 Request settings redirect to override Client settings
When Request sets the number of redirects, its data will override the number of redirects set by the Client. The specific configuration is as follows:
final RestClient client = RestClient.ofDefault();
client.get("http://127.0.0.1:8081/")
.maxRedirects(3)
.execute()
.thenAccept((response)-> {
try {
System.out.println(response.bodyToEntity(String.class));
} catch (Exception e) {
e.printStackTrace();
}
});
2.8 Other functions
If users want to know more about the functions of RestClient, please refer to: " Function Document ".
3 Performance
3.1 Test Scenario
The server is an Echo server, and the client uses RestClient, Apache HttpAsyncClient and OK Httpclient respectively use POST requests, the content of the request body is a fixed string: OK, and the content of the response body is also a fixed string: OK.
3.2 Machine Configuration
3.3 JVM parameters
-Xms1024m -Xmx1024m -XX:MetaspaceSize=128m -XX:MaxMetaspaceSize=256m -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
3.4 Client Version
3.5 Test method
How to test the performance of an asynchronous client? This is the first question we have to face before the performance test, here are some thoughts on the question:
Can a for loop initiate synchronous requests while using multithreading to reach the framework's request processing limit and consider that limit as the best TPS for the client?
Generally speaking, since users choose an asynchronous client, they must use asynchronous mode to initiate requests most of the time. The results of testing in synchronous mode cannot represent the performance of the client in asynchronous mode. Therefore, for users of asynchronous clients, the final result of the synchronous test is not of great reference value.
Therefore, this method is not suitable for performance testing of asynchronous clients.
Can use a single-threaded for loop to initiate a request asynchronously, and directly regard the TPS at this time as the best TPS for the client?
When an asynchronous client initiates a request asynchronously, the method that initiates the request returns very quickly (because the process of request execution is mainly performed in the IO thread pool). Although only a single thread is used, if the for loop always initiates requests asynchronously, the request initiation speed will also be much faster than the IO thread pool processing the request, which will cause a large number of requests to accumulate somewhere in the program (such as acquiring a connection) , resulting in program errors or poor performance.
Therefore, this method is not suitable for performance testing of asynchronous clients.
So how should you test the performance of an asynchronous client?
Asynchronous clients are designed for asynchrony. Since users choose asynchronous clients, they must use asynchronous methods to initiate requests most of the time. Therefore, for asynchronous clients, it is more appropriate to use asynchronous methods to test their performance. Way. The crux of the problem lies in how to avoid initiating asynchronous requests too quickly during the testing process, causing the speed of initiating requests to exceed the processing capacity of the framework.
The main question is determined, and the answer is basically determined. To avoid initiating asynchronous requests too quickly, we can way to adjust the speed of request initiation. For adjusting the initiation speed of asynchronous requests, we can try the following two methods:
- After the for loop periodically sends a certain number of asynchronous requests, sleep for a while, and then continue to initiate asynchronous requests. We can control the initiation rate of asynchronous requests by controlling the sleep time and controlling how many request intervals to sleep.
- After the for loop periodically sends a certain number of asynchronous requests, it sends a synchronous request, and then continues to initiate asynchronous requests. The sleep time is replaced by a synchronous request. The execution of the synchronous request just means that the requests in the request queue have been queued up. In fact, the principle is the same, but there are fewer variables controlled in this way. It is only necessary to control how many asynchronous requests are initiated and then initiate a synchronous request (that is, the ratio of the number of asynchronous requests to the number of synchronous requests in a cycle).
The above two methods can control the initiation rate of asynchronous requests. In the end, we choose to use the second method to control the initiation rate of asynchronous requests, because the second method needs to control fewer variables, so our testing process will be simpler. .
So in the end our test method is:
Use asynchronous and synchronous alternate methods to initiate requests, continuously adjust the ratio of asynchronous requests to synchronous requests in a cycle, adjust the client's various configurations under each ratio to achieve the best TPS, and record under each ratio, The optimal TPS of the framework is found until the ratio of asynchronous requests to synchronous requests is increased, and the TPS of the framework no longer rises, or even falls. The inflection point is the performance limit point of the framework.
3.6 Test Results
In the above figure, the abscissa is the ratio of asynchronous requests to synchronous requests, and the ordinate is TPS. From the above figure, we can see that:
- RestClient: As the ratio of asynchronous and synchronous requests increases, it first increases and then decreases. When the ratio of asynchronous and synchronous requests is 800, the TPS is the best, which is 111217.98.
- Apache HttpAsyncClient: As the ratio of asynchronous and synchronous requests increases, it first increases and then decreases. When the ratio of asynchronous and synchronous requests is 800, the best TPS is 83962.54.
- OK Httpclient: As the ratio of asynchronous and synchronous requests increases, it first increases and then decreases. When the ratio of asynchronous and synchronous requests is 300, the TPS is the best, which is 70501.59.
3.7 Conclusion
The best TPS of RestClient in the above scenario is 32% higher than the best TPS of Apache HttpAsyncClient, and 57% higher than the best TPS of OK Httpclient.
4 Architecture Design
4.1 Design principles
- High performance: the goal & core competitiveness of continuous pursuit.
- High scalability: open extension points to meet the needs of business diversification.
- Full-link asynchronous: Provides complete asynchronous processing capabilities based on CompletableStage.
4.2 Structural Design
The above picture shows the structure of RestClient. We will introduce the meaning of each part from top to bottom:
4.2.1RestInterceptorChain
RestInterceptorChain is a collection of RestInterceptors. When a user calls a request, it will go through all RestInterceptors in the RestInterceptorChain in turn. Users can specify their order in RestInterceptorChain by implementing the getOrder() method in RestInterceptor.
4.2.2EncodeAdviceChain
EncodeAdviceChain is a collection of EncodeAdvices. Before Encode, it will pass through all EncodeAdvices in EncodeAdviceChain in turn. Users can specify their order in the EncodeAdviceChain by implementing the getOrder() method in EncodeAdvice.
4.2.3 EncoderChain
EncoderChain is a collection of Encoders. When Encoding, it will go through all Encoders in EncoderChain in turn, until an Encoder directly returns the result of Encode (that is, it can Encode the request). Users can specify their order in the EncoderChain by implementing the getOrder() method in the Encoder.
4.2.4DecodeAdviceChain
DecodeAdviceChain is a collection of DecodeAdvice. Before Decode, it will pass through all DecodeAdvice in DecodeAdviceChain in turn. Users can specify their order in DecodeAdviceChain by implementing the getOrder() method in DecodeAdvice.
4.2.5DecoderChain
DecoderChain is a collection of Decoders. When Decoding, it will go through all Decoders in DecoderChain in turn, until a Decoder directly returns the result of Decode (that is, it can Decode the response). Users can specify their order in DecoderChain by implementing the getOrder() method in Decoder.
4.2.6NettyTransceiver
NettyTransceiver is a bridge between RestClient and its underlying framework, Neety. Before introducing it, some preliminary knowledge is required. Let's briefly introduce these preliminary knowledge:
4.2.6.1Channel & ChannelPool &ChannelPools
Channel: Channel is an abstract class of Netty network operations. It aggregates a set of functions, including but not limited to network read and write, client initiates connections, actively closes connections, closes links, and obtains the network addresses of both parties. It also contains some functions related to the Netty framework, including obtaining the EventLoop of the Channel, obtaining the buffer allocator ByteBufAllocator and pipeline, etc.
ChannelPool: ChannelPool is used to cache Channels, it allows acquiring and releasing Channels, and acts as a pool of these Channels, so as to achieve the purpose of multiplexing Channels. In RestClient, each Server host corresponds to a ChannelPool.
ChannelPools: ChannelPools are used to cache ChannelPools. In RestClient, when a Server host has not been accessed for a long time, its corresponding ChannelPool will be regarded as expired cache and resources will be recycled.
4.2.6.2EventLoop & EventLoopGroup
EventLoop: EventLoop is used in Netty to run tasks to handle events that occur during the lifetime of a Channel. In RestClient, an EventLoop corresponds to a thread.
EventLoopGroup: EventLoopGroup is a group of EventLoops, which ensures that multiple tasks are distributed as evenly as possible on multiple EventLoops.
4.2.6.3Epoll
Epoll is an extensible I/O event notification mechanism of the Linux kernel, including the following three system calls.
int epoll_create(int size);
Create an epoll instance in the kernel and return an epoll file descriptor (corresponding to epollFD in EpollEventLoop in the figure above). In the original implementation, the caller told the kernel how many file descriptors to listen on via the size parameter. If the number of monitored file descriptors exceeds size, the kernel will automatically expand the capacity. Now size has no such semantics, but size must still be greater than 0 when the caller calls it to ensure backward compatibility.
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
Adds, modifies or deletes the listener for the event event on the fd to the kernel epoll instance corresponding to the epfd. op can be EPOLL_CTL_ADD, EPOLL_CTL_MOD, EPOLL_CTL_DEL respectively corresponding to adding a new event, modifying the event type monitored on the file descriptor, and deleting an event from the instance.
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
When timeout is 0, epoll_wait will always return immediately. When timeout is -1, epoll_wait will block until any registered event becomes ready. When timeout is a positive integer, epoll blocks until timeout milliseconds expire or the registered event becomes ready. Because of kernel scheduling delays, blocking time may slightly exceed timeout milliseconds.
Epoll operation process:
1. The process first creates an epoll file descriptor (corresponding to epollFD in EpollEventLoop in the above figure) by calling epoll_create. epoll opens up a shared space through mmap, which contains a red-black tree and a linked list (corresponding to the corresponding Shared space in epollFD in the above figure).
2. The process calls epoll_ctl add of epoll, and puts the newly linked file descriptor into the red-black tree.
3. When the fd in the red-black tree has data, put it into a linked list and maintain whether the data is writable or readable.
The upper user space (via epoll_wait) takes all fds from the linked list, and then reads and writes data to them.
4.2.6.4NettyTransceiver initialization
When RestClient has just completed initialization, NettyTransceiver has just completed initialization, and its initialization mainly includes the following two parts:
- Initialize ChannelPools. The newly initialized ChannelPools is empty and does not contain any ChannelPools.
- Initialize EpoolEventLoopGroup, EpoolEventLoopGroup contains multiple EpoolEventLoops. Each EpoolEventLoop contains the following three parts:
executor: The thread that actually executes the task.
taskQueue: task queue, the task to be executed by the user will be added to the queue, and then executed by the executor.
epollFD: epoll's file descriptor. When EpoolEventLoop is created, epoll_create is called to create an epoll shared space, and its corresponding file descriptor is epollFD.
4.2.6.5NettyTransceiver send request
When a request is sent for the first time: NettyTransceiver will create a ChannelPool for the Server host (such as ChannelPool1 in the above figure), and cache it in channelPools (by default, if the Server host has no request within 10 minutes, the cache is considered expired, and its corresponding ChannelPool will be removed from channelPools). When ChannelPool is initialized, it mainly includes the following two parts:
- Initialize the channelDeque to cache the channel, and to obtain the channel is to take a channel from the channelDeque.
- Select an EventLoop in the EventLoopGroup as the executor, which is used to perform operations such as acquiring connections. The reason why ChannelPool needs a fixed executor to perform operations such as acquiring connections is to avoid the situation where multiple threads acquire connections at the same time, so there is no need to lock the operation of acquiring connections.
After the ChannelPool initialization is completed, the executor will obtain the channel from the ChannelPool. When the channel is first obtained, since there is no channel in the ChannelPool, the first channel will be initialized. The initialization steps of the channel mainly include the following steps:
- Create a connection, encapsulate the connection as a channel.
- The connection corresponding to the channel is added to the red-black tree of the shared space corresponding to the epollFD of an EpollEventLoop in the EpollEventLoopGroup through the epoll_ctl add method.
- Put the channel into the channelDeque corresponding to the ChannelPool.
After the initialization of the channel is completed, the executor returns the initialized channel, and the EpollEventLoop bound to the channel (that is, the EpollEventLoop selected in the second step of initializing the channel) continues to perform the task of sending the requested data.
4.2.6.6 NettyTransceiver receives response
4.2.7.1 Further optimization of threading model
The above threading model is our current version's threading model, and it is also the threading model of Netty's own connection pool. But is the performance of this threading model necessarily the highest?
The answer to this question should be no, because although specifying an EventLoop as an executor in the ChannelPool to perform the operation of acquiring the Channel can make the process of acquiring the Channel free from multi-threaded contention, it introduces the following two problems:
- There is a high probability that an EventLoop switch will be performed between the acquisition of Channel and Channel.write() (it is possible that the acquisition of Channel and Channel.write() will be assigned to the same EventLoop. If they are assigned to the same EventLoop, EventLoop switching is not required. So here it is said that there is a high probability of switching), this switching has a certain performance cost.
- EventLoop tasks in EventLoopGroup are unevenly distributed. Because the EventLoop that obtains the connection in the channelPool also processes the sending and receiving of data while obtaining the connection, it does more work than other EventLoops, and the EventLoop has also become a performance bottleneck. In our actual test, we did find that the thread CPU utilization of one EventLoop is higher than that of other EventLoops.
So what does a better threading model look like? Through the above analysis, we think it should meet the following two points:
- No thread switching between getting Channel and Channel.write().
- The tasks of each EventLoop are distributed evenly.
Based on our needs, we can conclude that the best structural model and threading model should be the following:
Optimized structural model:
As shown in the figure above: a ChannelPool consists of multiple ChildChannelPools (number = number of IO threads), a ChildChannelPool is bound to an EventLoopGroup, and the EventLoopGroup contains only one EventLoop (that is, a ChildChannelPool corresponds to an EventLoop).
Optimized threading model:
As shown in the figure above: first perform some operations in the business thread and obtain
ChannelPool, and select a ChildChannelPool (the selected implementation is similar to the EventLoopGroup.next() implementation, which ensures the even distribution of the ChildChannelPool), then obtain the Channel through the ChildChannelPool (this process is performed in the EventLoop corresponding to the ChildChannelPool), and then call the Channel. write()
(This process is also executed in the EventLoop corresponding to the ChildChannelPool).
The above process neatly achieves two points of the high-performance threading model we need at the beginning:
- There is no thread switching between channel acquisition and Channel.write() - Since the EventLoopGroup in ChildChannelPool has only one EventLoop, the created Channel can only be bound to this EventLoop, so both channel acquisition and Channel.write() can only be used in The EventLoop is executed so that there is no thread switching.
- Each EventLoop task is evenly distributed - since the ChildChannelPool is evenly obtained from the ChannelPool (the process is similar to the process of EventLoopGroup.next()), and one ChildChannelPool just corresponds to one EventLoop, so that the request tasks are evenly distributed.
In practice, we also tested through a demo: we found that using the above thread model and structure model, the performance of RestClient was improved by about 20% on the basis of the current version. It is expected that RestClient will provide the above threading model and structure model in the next version.
Some designs of other performance optimizations
5、Netty
RestClient is written based on Netty. Some high-performance features that come with Netty are naturally the cornerstone of RestClient's high performance. Common Netty features are used in RestClient:
- Epoll
- Channel & ChannelPool
- EventLoop & EventLoopGroup
- ByteBuf & PooledByteBufAllocator
- Future & Promise
- FastThreadLocal &InternalThreadLocalMap
- ...
Among them: Epoll, Channel & ChannelPool, EventLoop & EventLoopGroup We have already explained it in the structural design part of this document. We will not explain it too much here. Let's mainly take a look at the other parts:
5.1 ByteBuf & PooledByteBufAllocator
Netty replaces ByteBuffer with ByteBuf which is easy to use and has good performance. I will not introduce ByteBuf in detail here, but briefly introduce how to use ByteBuf in RestClient to improve performance for a better user experience:
- When sending a request, a ByteBuf is allocated using the PooledByteBufAllocator, which pools instances of ByteBuf to improve performance and minimize memory fragmentation.
- When a response is received, CompositeByteBuf is used, which provides a virtual representation of multiple buffers as a single merged buffer, reducing unnecessary data copying of aggregated responses when responses arrive in batches.
5.2 Future & Promise
Future & Promise is the cornerstone of Netty's asynchronous. We will not introduce Future & Promise in detail here, but mainly introduce some technical trade-offs related to Future & Promise in RestClient.
RestClient uses Future & Promise to achieve asynchronous data packet sending and receiving, and converts Future & Promise into CompletionStage when facing users. In this way, the entire request chain from data packet sending and receiving to user codec is realized asynchronously.
5.3WhyCompletionStage,Not Future & Promise?
CompletionStage is a newly added interface in Java 8, which is used for stage processing in asynchronous execution. It is widely used in the calculation process of Lambda expressions. Currently, there is only one implementation class of CompletableFuture.
Compared with Netty's Future & Promise, Java developers are more familiar with CompletionStage, and the interface functions of CompletionStage are more powerful, allowing users to implement business logic more flexibly.
5.4 Why CompletionStage,Not CompletableFuture?
Why use CompletionStage
instead of using CompletableFuture. It's because CompletionStage is an interface, and CompletableFuture is the implementation of CompletionStage. Using CompletionStage is more in line with the principles of interface-oriented programming. At the same time, users can also use CompletionStage.toCompletableFuture() to convert CompletionStage to CompletableFuture.
5.5How To Combine Future & Promise With CompletionStage?
When the user calls the request to send, we build a CompletionStage, and add a Listener to the Future returned by the Netty processing request and response logic, and end the CompletionStage in the Listener. Through this, the combination of Future & Promise and CompletionStage is realized, thereby realizing the asynchrony of the entire request chain.
Interested users can check the handle() method in io.esastack.httpclient.core.netty.HttpTransceiverImpl, which completes the conversion of Future to CompletionStage.
5.6 FastThreadLocal&InternalThreadLocalMap
FastThreadLocal changes the ThreadLocalMap that uses the hash structure in ThreadLocal to the InternalThreadLocalMap that directly uses the array structure. The structure diagram of ThreadLocal and FastThreadLocal is roughly as follows:
ThreadLocal structure diagram
FastThreadLocal structure diagram
As shown in the figure above, compared to ThreadLocalMap, InternalThreadLocalMap is simpler to obtain and set values directly according to the index, and the complexity of directly using arrays is lower (although ThreadLocalMap is also an array structure, it is not only an array access operation) Encapsulates a large number of hash calculations and related operations to prevent hash collision). So FastThreadLocal gets higher performance.
FastThreadLocal is used instead of ThreadLocal in RestClient to obtain higher performance.
5.7 Encode & Decode
Unlike most Http client frameworks, RestClient not only supports Encode Java objects into byte[], but also supports Encode Java objects into other underlying Netty supported objects, such as: File, MultipartBody, etc. In the future, ChunkInput will also be supported for Supports sending requests with larger request bodies in chunks.
The reason for this design is that if we only support Encoding Java objects into byte[], an OutOfMemoryException will occur when the byte[] data after Encode is too large. This problem occurs when a user sends a large file or a request with a large request body.
In order to solve this problem, RestClient solves this kind of problem by allowing users to Encode Java objects into File or ChunkInput. When the user encodes a Java object into a File, RestClient will use the underlying Netty to use NIO to send the file in a zero-copy manner, avoiding OOM and reducing multiple copies of data.
Similarly, when the user encodes the Java object into ChunkInput, the RestClient will send the data in chunks to avoid loading all the data into the memory at one time, thereby avoiding the situation of OOM. (PS: ChunkInput is not currently supported in the current version, but an extension point has been set aside and will be supported in the next version)
The same optimization is also done during Decode. Since the principle is the same, it will not be explained here.
6. Conclusion
Although RestClient mainly only involves the simple function of initiating a request, "the sparrow is small, but it has all the internal organs", it takes into account all aspects of performance optimization, and at the same time, it also tries to achieve no problems in interface design, clean code, and perfect functions. compromise.
It is still a young project, and all technology enthusiasts are welcome to join and discuss learning and progress together.
About the Author
Without OPPO Cloud Computing Center Cloud Native Group Backend Engineer
In-depth participation in ESA Restlight, ESA RestClient, ServiceKeeper and other high-performance open source framework projects
For more exciting content, please scan the code and follow the [OPPO Digital Intelligence Technology] public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。