Why is Dubbo not suitable for file transfer?

空无
中文
Like it first, then watch it, develop a good habit

background

The company previously had a Dubbo service, which encapsulated Tencent Cloud’s object storage service SDK. The purpose is to manage the SDK of this third-party service. Other systems directly call the Dubbo service of this object storage. In this way, it can avoid the incompatible major version update of the platform SDK, which will lead to the problem of all the company's system modifications followed by the upgrade.

The idea is good, but this approach is not appropriate, because Dubbo is not suitable for transferring files. Fortunately, no one used this system to discard it soon after it went online...

Although the system is obsolete, the subject of Dubbo upload files can still be analyzed in detail and talk about why it is not suitable for uploading files.

How does Dubbo transfer files?

Is it to pass File directly like this?

void sendPhoto(File photo);

Of course not! Dubbo just serializes the object and then transmits it, and the File object cannot handle the data of the file even if it is serialized, so it can only send the file content directly:

void sendPhoto(byte[] photo);

But this will cause the consumer side to need read the complete file content into the memory at one time, and no amount of memory can hold it. And when the provider side receives the data parsing message, it also needs to read the byte[] into the memory at one time, which also has the problem of high memory usage.

Single connection model problem

In addition to the memory usage problem, the single connection model of Dubbo (here refers to the Dubbo protocol) is not suitable for file transfer.

Dubbo protocol is a single connection model by default, that is, all requests of a provider use a TCP connection. By default, Netty is used for transmission, . In Netty, in order to ensure the safety of the Channel thread, the write event will be queued . Then under a single connection, multiple requests will use the same connection, that is, the same Channel to write data; when multiple requests are written at the same time, if a packet is too large, the Channel will always send the packet. The message writing event of other requests will be queued, and it will not be sent for a long time, and the data has not been sent. Then other consumers will naturally be in a state of blocking waiting for a response and have been unable to return.

Therefore, in a single connection, if the message is too large, it will cause Netty's write event processing to be blocked, and the data cannot be sent to the server in time, which will cause the request to block the problem in vain.

So since the single-connection model has such a big disadvantage, why does Dubbo still use single-connection?

Because it saves resources, the resource of TCP connection is very precious. If a single connection can satisfy most scenarios, there is no need to prepare a connection for each request.

The reason for the single connection design is also mentioned in the Dubbo document:

Because the current situation of the service is mostly that there are few service providers, usually only a few machines, and many consumers of the service, the entire website may be accessing the service. For example, Morgan’s provider only has 6 providers, but there are hundreds of consumers. There are 150 million calls per day. If the conventional Hessian service is used, the service provider is easily overwhelmed. Through a single connection, it is ensured that a single consumer will not overwhelm the provider, long connections, and reduce connection handshake verification. And use asynchronous IO, reuse the thread pool to prevent C10K problems.

Although the Dubbo protocol defaults to a single connection model, you can still set up multiple connections:

<dubbo:service connections="1"/>
<dubbo:reference connections="1"/>

However, under multiple connections, the connection and request are not one-to-one correspondence, but a polling mechanism. As shown in the figure below, when N connections are configured, multiple connections will be maintained for each Provider instance, and different connections will be allocated for each request through a polling mechanism when the request is executed.

image.png

Why is the HTTP protocol "suitable" for file transfer?

In fact, it is not rigorous to say that it is not that the HTTP protocol is suitable for transferring files. Dubbo also supports the HTTP protocol (although it is a semi-defective product), which is also not suitable for transferring files.

RPC frameworks such as Dubbo must serialize data into language objects in order to satisfy "calling local methods like calling remotes," but this makes it impossible to process objects in the form of File.

If you go beyond the limitations of Dubbo's RPC framework feature, and look at the HTTP protocol alone, it is very suitable for file transfer. Because for the Client, you only need to send the message to the Server. For example, if the file to be transferred is local, then I can read only one Buffer size of the file at a time, and then use the Socket to send the data of this Buffer. ; In this way, the data that exists in the memory at the same time will only have the size of a Buffer, and there will be no problem of reading all the data into the memory like Dubbo.

As shown in the figure below, the Client only reads 4K Buffer data from a 1GB file each time, and then uses Socket to send it until the file is completely read and sent successfully. Then for a single transmission in this way, the memory is always occupied by only the size of 4K buffer, and it will not be read as byte[] at one time like Dubbo and then sent.

image.png

The same is true for the Server side. The Server side does not need to read all the messages into the memory at once. After parsing the Content-Length in the Header, it directly wraps an InputStream, and reads the data of the Socket Buffer inside the InputStream. However, there will be no memory occupation problem (for more detailed file message processing methods, please refer to my other article "How to handle file upload in Tomcat?" ).

So since the HTTP protocol is "suitable" for file transfer, what problems will Spring Cloud's standard RPC client-Feign have in transferring files?

Is Feign suitable for transferring files

Feign is not actually a set of RPC framework, it is just an Http Client. When using Feign, Server can be any Http Server, such as Tomcat/Jetty/Undertow that implements Servlet, or Apache Server in other languages, etc.

When Feign is generally used, it is in the Spring Cloud family bucket environment, and the server is often the default Tomcat. When Tomcat reads a file message (form-data), it will temporarily save the message to the disk, and then read the message content in the disk through FileItem. Therefore, for the server side, the complete message data will not be read into the memory at one time, and there will be no problem of excessive memory usage.

There are several ways to upload files in Feign:

interface SomeApi {

  // File parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (@Param("is_public") Boolean isPublic, @Param("photo") File photo);

  // byte[] parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (@Param("is_public") Boolean isPublic, @Param("photo") byte[] photo);

  // FormData parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (@Param("is_public") Boolean isPublic, @Param("photo") FormData photo);
    
  // MultipartFile parameter
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto(@RequestPart(value = "photo") MultipartFile photo);
    
  // Group all parameters within a POJO
  @RequestLine("POST /send_photo")
  @Headers("Content-Type: multipart/form-data")
  void sendPhoto (MyPojo pojo);

  class MyPojo {

    @FormProperty("is_public")
    Boolean isPublic;

    File photo;
  }
}

Feign abstracts the encoding/serialization of parameters as an Encoder, and also provides a feign-form module for file upload of HTTP protocol, which provides some FormEncoders. But no matter which FormEncoder is finally output through the Feign-encapsulated Output object, but this Output object is not the kind of packaging Socket InputStream as a relay transmission, but directly as a data carrier, using a ByteArrayOutputStream to store the encoded data .

So no matter how you define the FormEncoder, the final data will be written to the ByteArrayOutputStream of this Output, and all the data will still be completely read into the memory, and there will also be a problem of high memory usage.

@RequiredArgsConstructor
@FieldDefaults(level = PRIVATE, makeFinal = true)
public class Output implements Closeable {

  ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

  //所有的数据在“编码”之后,仍然会写入到 ByteArrayOutputStream 这个内存 OutputStream 中
  public Output write (byte[] bytes) {
    outputStream.write(bytes);
    return this;
  }

 
  public Output write (byte[] bytes, int offset, int length) {
    outputStream.write(bytes, offset, length);
    return this;
  }

  public byte[] toByteArray () {
    return outputStream.toByteArray();
  }

}

But fortunately, Feign is only an HTTP Client, and the server side is still "incremental" reading. For the server side, there will be no memory problem.

to sum up

In fact, Dubbo is not only not suitable for transferring files, but also not suitable for large message scenarios. Dubbo's design is more suitable for the transmission of small business messages (the default message size is only 8MB).

So if there is a file upload scene, use the client-side direct upload method as much as possible, which is friendly and saves resources!

Originality is not easy, unauthorized reprinting is prohibited. If my article is helpful to you, please like/favorite/follow to encourage and support it ❤❤❤❤❤❤
阅读 2.1k

坚持原创,专注分享 JAVA、网络、IO、JVM、GC 等技术干货

2.8k 声望
4.2k 粉丝
0 条评论
你知道吗?

坚持原创,专注分享 JAVA、网络、IO、JVM、GC 等技术干货

2.8k 声望
4.2k 粉丝
文章目录
宣传栏