(Click to register for Rongyun 2022 Social Pan-Entertainment Overseas Carnival)
At the end of August, "IM Advanced Practical Master Class · Lecture 2" conducted a detailed dismantling around "Technical Selection of Instant Messaging Products on the Web & Electron Platform".
The lecturer of Rongyun used metaphors and other methods to analyze and compare the front-end technical solutions of IM scenarios vividly and logically, and shared the best practices of Rongyun. The next issue will focus on the full capabilities of IM, just on September 20. Follow [Rongyun Global Internet Communication Cloud] to learn more
Common business forms and core functions of IM
The common business forms of instant messaging products are as follows: chat room, single group chat, super group, real-time notification, online broadcast. The underlying functions are like basic parts, and different business forms of the upper layer can be spliced together in different ways.
The basic functional unit modules are roughly divided into three categories:
The most basic is the requirement of connection management , which is the basis of instant messaging business. Next is the connection-based data transmission at both ends. Here we should focus on the data transmission protocol during front-end and back-end communication, that is, the process management of data serialization and deserialization. Finally, based on the query function of existing data, we mainly share the front-end persistent storage technology.
In the instant messaging scenario, some technical requirements for these three technical points are different.
- Connection Management - Continuous, stable and timely two-way network connection
- Data transmission - a secure, efficient, and easily scalable front-end and back-end data transmission protocol
- Record Query - Front-end Data Persistence Storage Solution
Network connection scheme comparison
We compare connection schemes horizontally through five indicators: connection speed, transmission efficiency, immediacy, security, and compatibility.
WebSocket is the first choice for the front-end , and it is also the native technical solution for building long-term connection services on the Web platform. Because of the browser security sandbox, we cannot directly access the transport layer protocol in the web browser, but there are no taboos in the Electron main process, and we can choose TCP in the Electron scenario.
The simulated duplex protocol solution based on HTTP is not simply the HTTP protocol itself, because in the long-connection business, the HTTP protocol with the short-connection feature does not match the demand scenario.
connection speed
Connection speed is the elapsed time from when a connection is initiated to when the connection is established.
A TCP connection requires a three-way handshake . For the initiator, it sends twice and receives once, and for the responder, it sends and receives twice.
Why do three handshakes? In fact, this is a very common interview question. For example, in order for two people to reach an effective dialogue, they need to determine two pieces of information: one is that their ears and mouths are fine; the other is that the other party’s ears and mouths are fine. Only on this basis can the dialogue between the two be effective. The three-way handshake is to complete such a capability confirmation process. Mouth = sending ability, ears = receiving ability.
Of course, this metaphor is not rigorous, and the two sides cannot communicate even when the two languages do not speak the same language. This is the problem to be solved by the data encapsulation protocol in the second technical direction, that is, the problem of understanding the intention of the opposite end.
The WebSocket connection should be established on the basis of the TCP connection , because WebSocket is an application layer protocol, that is, the 7-layer protocol that operators often say, and TCP is a transport layer protocol, which is a 4-layer protocol.
After the TCP handshake is completed, an upgrade request is made to the server protocol through an HTTP message, and the service responds to the request with an HTTP response message. At this point, the WebSocket negotiation is completed. There are two more actions than TCP, so the connection speed of WebSocket is slower than TCP. HTTP is a short connection protocol, the connection cannot be maintained stably, and the connection needs to be rebuilt for each communication, so the connection speed is meaningless to it.
Transmission efficiency
We define transmission efficiency as the traffic consumption, time consumption, and computing power consumption of the same piece of data during the transmission process to assist in the horizontal comparison of different network protocols. The greater the consumption, the lower the efficiency.
Here we mainly look at traffic consumption and time consumption, and computing power consumption is almost negligible in Web and Electron. Let's take a look at the OSI reference model first. TCP is a 4-layer protocol, and WebSocekt is a 7-layer protocol. These are actually the hierarchical concepts in the OSI reference model.
In the OSI reference model, data is transferred between two network nodes, which is a U-shaped transfer process: the sender transfers from top to bottom, and each layer needs to add different protocol header information to the data packet to ensure that the receiver is on the same layer It can be parsed; the receiving end transmits from bottom to top, and the transmission process strips the header information in the data layer by layer and transmits the data upward.
Therefore, during the transmission of the data packet from top to bottom, the data volume is continuously increased. The same piece of data is sent directly through TCP, and the traffic consumption is lower than that of the WebSocket protocol. Let's take a look at what the extra traffic is for passing data through the WebSocket protocol.
First of all, the smallest unit of WebSocket data transfer is a data frame. A piece of data will be divided and assembled into at least one data frame and written to the TCP buffer. If the data is relatively large, it will be split into multiple data frames, and then the data frame will be restored after the peer receives it.
The binary sequence structure in the figure below is the data composition in the data frame.
HTTP is a text protocol, there is no minimum sending unit, or the minimum sending unit of HTTP message data is the minimum sending unit of TCP protocol. It is not like WebSocket, which divides the data into N data frames, and adds WebSocket header information to each data frame before handing it over to TCP. It does not actively divide the data, but only adds the first line and Headers information to the data header, and finally writes the complete message information into the TCP buffer, which is handed over to TCP to manage the transmission process.
So, is HTTP's transmission efficiency better than WebSocket? Not necessarily.
First of all, a WebSocket data frame can add up to 2-14 bytes of additional header information, but the space consumed by the first line and header information of the HTTP protocol itself is much larger than 14 bytes, which is also due to the characteristics of the HTTP protocol. It is a textual protocol, and a character requires at least one byte of capacity to be stored.
Secondly, the expansion of the HTTP protocol will add additional header information of the message, which is also in character type and in the form of key-value pairs. In addition, HTTP itself is a short connection, which means that the service must first confirm the identity of the data sender when receiving a request, so the message data inevitably carries authentication information in each request, but the long connection protocol does not. need these extra costs.
Therefore, the overall transmission efficiency of WebSocket is better than HTTP , unless the data to be transmitted is so large that the sum of the header information space of the number of WebSocket data frames exceeds the HTTP header, but this situation generally occurs in low-frequency such as file uploads. Scenes. In most business data exchanges, the data sent at a time is not very large.
immediacy
Immediateness is the amount of wait time that data may experience when it is ready to be written to the TCP buffer.
For instant messaging scenarios, the upstream and downstream of data occur at the same time, which also means that the short feature of HTTP is disadvantageous, because the downstream data will be blocked, and the server cannot complete the active push of data through the HTTP protocol of short connection.
Here we first popularize some basic concepts of network protocols, because it is closely related to the immediacy of the comparison.
The first category of concepts is the description of connection persistence.
A long connection , in layman's terms, means that the connection persists after it is established. The two ends can send data to each other through the existing connection. Only when one end actively terminates the connection, the connection will be closed. Both TCP and WebSocket are persistent connection protocols. (For more sharing about long connections, click here)
A short connection means that when I need to communicate with the peer, a connection is established, and the connection is closed immediately after the communication is completed. HTTP is a short connection protocol. A connection is established when a request is made, and the connection is closed after a response is received. Of course, KeepAlive can also be used to keep the TCP connection multiplexed, but it still does not guarantee that the connection cannot be closed. Because of the continuous validity of the connection, the immediacy of a long connection is better than that of a short connection. Because it avoids the waiting process of establishing a connection when data is sent.
The second type of concept is the definition of byte stream data flow direction control.
A full-duplex protocol means that the byte stream can flow freely in both directions in the connection. Because of this free flow, the immediacy of this type of protocol is the best, and it is usually a long-term connection protocol. As long as the buffer is large enough, there is basically no waiting process. Both TCP and WebSocket are full-duplex protocols.
The half-duplex protocol means that the byte stream can flow in two directions, but there can only be data flowing in one direction at the same time. The half-duplex protocol is like a road with only one lane. When there is an oncoming vehicle, the vehicle in the direction cannot enter the lane, otherwise the road will be blocked. The HTTP protocol is a typical half-duplex protocol. It actually allows data to flow in both directions, but its response must be completed after the request data is received, and there is no bidirectional flow of byte streams at the same time. The immediacy of the half-duplex protocol is lower than that of the full-duplex protocol, because it has a waiting process for the use of the connection, and when there is an opposite data stream, the data is sent with a delay.
A simplex protocol means that data can only flow in one direction, such as the SSE function in the HTTP protocol. Because the simplex protocol cannot independently complete the two-way data flow and does not meet the needs of instant messaging, we will not consider it. In terms of classification, the immediacy of WebScoekt and TCP are basically at the same level, and HTTP is weaker than them.
Of course, a single request from HTTP cannot be compared with the long connection protocol. We also need to see whether the concurrent multi-connection requests through the HTTP protocol can make up for its own shortcomings. Here, we will further analyze the long connection simulation scheme based on HTTP protocol. Let's think about it first, what is the core problem to be solved for a solution based on the HTTP protocol?
The first point is the sending delay caused by waiting for the connection to be established when the client sends data. Because of the short connection nature of HTTP, in the process of uplink data transmission, it is necessary to wait for the establishment of the TCP connection before sending HTTP packets. For this point, as far as the HTTP protocol is concerned, there is currently no solution. The KeepAlive feature of HTTP can be alleviated, but cannot be completely solved.
The second point is the downlink data delay caused by the inability of server data to be actively pushed to the end. This is also caused by the short connection feature of HTTP. When the server has downlink data, there is no continuous valid connection that can push the data down, so it can only wait for the client to actively establish a connection and take downlink data along the way. The solution we are going to introduce is based on solving the second point. There are three popular front-end solutions on the market based on the HTTP protocol.
Comet partially alleviates the downlink latency issue.
The HTTP + SSE scheme is similar to Comet, except that the downstream channel is changed from HTTP request to SSE implementation.
The characteristics of SSE are long connection, simplex protocol, and its overall effect is better than Comet because there is no additional connection waiting time. It acts as a channel for downlink data, and the simplex protocol also fully meets the requirements. It can be said that the downstream immediacy of HTTP + SSE is basically equivalent to that of WebSocekt, which basically solves the problem of downstream delay.
Long-Pulling , which regularly sends requests to the server to bring back downlink data, is basically a routine operation. There are basically two core problems that have not been solved. In summary, HTTP + SSE > Comet > Long-Pulling
This conclusion has a premise, which is to put aside compatibility.
Although the SSE solution is good, it is limited to browsers. If we want to reuse JS code to other environments such as small programs, this solution cannot be implemented. At present, the support for SSE of each applet Runtime is almost zero.
safety
Let's take a look at the OSI reference model, focusing on the part between the application layer and the transport layer, where SSL/TLS is located, which is the S in HTTPS we often say.
In the definition of the OSI model, the session layer is responsible for session maintenance and identity authentication at both ends, and the presentation layer is responsible for data encryption and decryption. On top of this, the security of the application layer protocol is equivalent, and the HTTPS and WSS protocols are the secure versions of the HTTP and WebSocket protocols.
TCP is a lower-level protocol than SSL/TLS, so data sent directly via TCP can have more security options, and TLS is just one of the options. When using HTTPS or WSS protocol, TLS/SSL support is provided by Runtime, and developers do not need to pay attention to the security issues during data transmission. When using the TCP protocol, the developer needs to ensure the security of the data transmission process (connect with TLS/SSL or other custom security solutions).
Data transfer protocol solution & front-end persistent storage
Data Transfer Protocol Scheme
We compare data transmission protocols through five indicators : information density, scalability, security, multi-end consistency, and compatibility . In addition to the self-developed solutions that we do not recommend, there are two commonly used data transmission protocols: Protocol Buffer (PB) , binary data in TLV format; JSON , plain text key-value pair data.
Comparing the five indicators, information density refers to the traffic that A needs to consume to communicate information to B. The higher the density, the less consumption. The information density of PB is higher than that of JSON.
In terms of transmission structure, PB is a binary data sequence in TLV format, JSON is a pure string series of key-value pairs, and the JSON description data structure is much larger than the PB description data structure.
Security is the opposite of readability. To read a binary PB data, we need to know the structure defined by the PB data in the serialization process. We often talk about the readability file defined by the PB file, which is a convention between the front and back ends. Based on this file, we can serialize and deserialize the binary data. JSON, on the other hand, is a pure string with good readability and can be used to understand the information in the data more intuitively.
In terms of compatibility , JSON has many native language libraries to choose from. The compatibility of PB is also supported by JS for ArrayBuffer from the front end.
In terms of scalability , the two sides are equivalent. PB has an advantage, because the data it transmits does not contain key information, so the key information at both ends can be different. The JSON transmission information contains key information, which means that the key information cannot be changed at will.
Multi- terminal consistency and compatibility are consistent . The instant messaging scenario involves many platforms, and multiple terminals need to perform multi-terminal transmission of data. The process of data serialization and deserialization should be consistent. PB and JSON are better in this respect. JSON It is natively supported, and PB can use the corresponding third-party library provided by Google. In conclusion, PB is better than JSON.
Front-end persistent storage comparison
Finally, there are front-end persistent storage technologies: LocalStorage, IndexDB and Sqlite. There are not many options for persistent storage itself, and aspects such as capacity, compatibility, and data consistency need to be considered.
- LocalStorage, the most commonly used solution, has a relatively low capacity
- IndexDB, the only persistent mass storage solution available in the browser
- Sqlite - Electron Only, commonly used front-end database
- Sqlite - WebAssembly, the research and development cost is relatively high
Except for LocalStorage, the other three have the same capacity.
In terms of data consistency , in addition to the use of Sqlite in the main process of Electron, the other three solutions are not very good at dealing with data competition problems, and it is difficult to ensure data consistency. Under the Electron platform, the database operation is done in the main process. When the rendering process needs to control the local database, it relies on IPC to communicate with the main process, and the main process handles the real database transactions. The main process is the only access point to the database, so data race and data consistency guarantees can be properly completed.
Rongyun's landing practice sharing
After comparing the solutions of the three major items, Rongyun chooses the optimal solution as much as possible under the available solutions according to the principles shared above. For example, the optimal solution on the Electron platform is TCP+Sqlite, and PB is used for data encapsulation. .
There is no TCP+Sqlite available on the Web platform, so WebSocket is used as the solution, and HTTP is used as the sub-optimal solution to implement the corresponding technology.
On the applet, if neither PB nor WebSocket can be used, the HTTP scheme is used as an alternative, and Comet is used for the selection.
This downgrade process is insensitive to the integrated developers, but needs to be paid attention to in the process of business implementation, such as message query, database or not are two application experiences.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。