1
头图

This article is shared by Zhang Kaihong, a senior technical expert on Tantan's server. The original title "Go language practice of Tantan link project", because the original content contains many errors, there are revisions and changes.

1 Introduction

The instant messaging long-term connection service is at the network access layer. This field is very suitable for using the Go language to play the characteristics of its multi-coroutine parallel and asynchronous IO.

Since the long connection project went online, Tantan has optimized the service several times: GC has been reduced from 5ms to 100 microseconds (Go versions are all 1.9 and above), and the main gRPC interface call delay p999 has dropped from 300ms to 5ms. While most of the industry's focus is on the number of single-machine connections, we are more focused on the service SLA (service availability).

What this article will share is the entire technical practice process and experience summary of the IM long connection module of the Stranger Social Application Tantan, from technology selection to architecture design, to performance optimization.

study Exchange:

  • 5 groups for instant messaging/push technology development and communication: 215477170 [recommended]
  • Mobile IM development introductory article: "One entry is enough for novices: Develop mobile IM from scratch"
  • Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK

(This article has been simultaneously published at: http://www.52im.net/thread-3780-1-1.html)

2. About the author

Zhang Kaihong: Served as a senior technical expert on Tantan server.

6 years of Go language development experience, and used the Go language to build a number of large-scale Web projects, which involve network libraries, storage services, and persistent connection services. Focus on Go language practice, storage service development and deep optimization of Go language in big data scenarios.

3. The origin of the project

Our project started in the second half of 2018. According to today, it will take about one and a half years.

At that time Tantan encountered some technical pain points, the most serious of which was relying heavily on third-party Push. For example, if the third-party had some failures, it would have a greater impact on the KPS of real-time IM chat.

At that time, the push message was pushed through push, and the push delay in the app was relatively high, with an average delay of five to six hundred milliseconds. We could not accept this time.

And there is no Ping Pland mechanism (heartbeat checking mechanism?), and it is impossible to know whether the user is online.

At that time, product and technology classmates thought it was an opportunity to make a long connection.

4. An episode

The project lasted for about a quarter. First of all, the IM business was implemented. We feel that the long connection is more closely tied to IM.

After IM landed and subsequent long-term connections went online, each business was more dependent on long-term connection services.

There is an episode in the middle, mainly about the name.

At the beginning of the project, I named the project Socket. I saw that the socket was more cordial and felt that it was just a long connection. This feels rather inexplicable. I don't know why. However, O&M raised objections, thinking that UDP is also a Socket, and I think UDP can actually be used for long connections.

The operation and maintenance proposal is called Keepcom, which is implemented by Keep Alive. This proposal is quite good, and we also used this name in the end.

The advice given by the client is Longlink, the other is Longconn, one is taken by a technical colleague on the IOS side, and the other is taken by a technical colleague on the Android side.

In the end we all lost. The operation and maintenance classmates won. The operation and maintenance classmates felt that if the name could not be determined, we should not go online. In the end, we compromised.

5. Why do long connections need to be made?

Why do long connections?

As shown in the figure above: it is obvious to look at the comparison, the left side is the long connection, and the right side is the short and long connection.

For a long connection, there is no need to re-enter the connection or release the connection. An X packet only needs one RTT to complete. For a short connection on the right, a three-way handshake is required to send a push packet, and finally a wave is made.

Conclusion: If you send a data packet of N messages, the RTT is 2+N times for the long connection, 3N RTT for the short connection, and Keep Alive is turned on at the end, where N is the number of connections.

6. Advantages of long connection technology

We have decided that long connection has the following four advantages:

1) Real-time: The long connection is a two-way channel, and the push of messages is also relatively real-time;
2) Stateful: The long connection itself maintains the user's state, and determines whether the user is online through the KeepAlive method;
3) Province-saving process: long-term connection saves more traffic, you can do some user-defined data compression, you can also save a lot of home and connection packages, so it is more traffic-saving;
4) More power saving: After reducing network traffic, it can further reduce the power consumption of mobile clients.

7. Is TCP competent on the mobile terminal?

Before the project started, we made more considerations.

First of all, let's take a look at the long connection of the mobile terminal, can the TCP protocol work?

For traditional long-term connections, long-connection TCP on the Web side can do the trick, but can TCP do the trick on the mobile side? This depends on several characteristics of TCP.

First of all, TCP has the characteristics of slow start and sliding window. In this way, TCP controls PU packets to avoid network congestion.

After the TCP connection, a slow start process is followed. This process is expanded by 2 N powers from the initial window size, and finally reaches a certain threshold. For example, the threshold is 16 packets, and gradually increases from 16 packets to 24. Data packets, so that the window maximum is reached.

In case of packet loss, of course there are two situations. One is fast retransmission, and the window is simple, which is equivalent to a window of 12 packets. If starting an RTO is similar to a state connection, the window drops to the initial window size.

If RTO retransmission is enabled, the blocking of subsequent packets is quite serious, and one packet blocks the sending of other packets.


(▲ The above picture is quoted from "Towards a higher level: the network foundation that good Android programmers must know and know")

For the basic knowledge of TCP protocol, you can read the following materials:

"TCP/IP Detailed Explanation-Chapter 17 TCP: Transmission Control Protocol"
"TCP/IP Detailed Explanation-Chapter 18 · TCP Connection Establishment and Termination"
"TCP/IP Detailed Explanation-Chapter 21 TCP Timeout and Retransmission"
"Easy to Understand-Deep Understanding of TCP Protocol (Part 1): Theoretical Foundations"
"Easy to Understand-Deep Understanding of TCP Protocol (Part 2): RTT, Sliding Window, Congestion Handling"
"Introduction to Network Programming for Lazy People (1): Quickly Understand Network Communication Protocol (Part 1)"
"Lazy Introduction to Network Programming (2): Quickly Understand Network Communication Protocol (Part 2)"
"Lazy Introduction to Network Programming (3): A quick understanding of TCP protocol is enough"
"Introduction to Brain Disabled Network Programming (1): Follow the animation to learn TCP three-way handshake and four waved hands"
"Introduction to network programming has never been easier (2): If you were to design the TCP protocol, what would you do? 》

8. TCP or UDP?

(▲ The above picture is quoted from "The Protocol Selection of Mobile IM/Push System: UDP or TCP?")

Four problems with TCP's long connection:

1) The amount of messages on the mobile terminal is still relatively sparse. Every time a user gets a mobile phone, the total number of messages sent is relatively small, and the interval between each message is relatively long. In this case, the advantages of TCP's indirect connection and maintaining long links are more obvious;
2) The packet loss rate is relatively high under weak network conditions, and subsequent data transmission of Block after packet loss is likely to be blocked;
3) The TCP connection timeout time is too long, the default is 1 second. This is because TCP was born earlier, and the network status at that time was not as good as it is now. At that time, it was set to 1s timeout, and now it can be set shorter;
4) In the absence of fast retransmission, the RTO retransmission wait time is longer, the default is 15 minutes, each time it is decremented by the Nth power.

Why choose TCP in the end? Because we think UDP is a bit more serious.

First of all, UDP has no sliding window, no flow control, and no slow start process. It is easy to cause packet loss, and it is also easy to cause packet loss and timeout in the intermediate state of the network.

Once the UDP packet is lost, there is no retransmission mechanism, so we need to implement a retransmission mechanism at the application layer. The amount of development is not that large, but I think it is relatively low-level and prone to failure, so I finally chose TCP.

TCP or UDP? This has always been a more controversial topic:

"Lazy Introduction to Network Programming (4): Quickly understand the difference between TCP and UDP"
"Lazy Introduction to Network Programming (5): Quickly understand why UDP is sometimes more advantageous than TCP"
"The 5G era has arrived, and TCP/IP is old, can you still eat? 》
"A network communication transport layer protocol that Android programmers must know and know-UDP and TCP"
"The Unknown Network Programming (6): Deeply understand the UDP protocol and use it well"
"The Unknown Network Programming (7): How to make the unreliable UDP reliable? 》

If you are not familiar with the UDP protocol, you can read this article: "TCP/IP Detailed Explanation-Chapter 11 · UDP: User Datagram Protocol".

9. More reasons to choose TCP

Let’s make a list of three main points:

1) At present, for mobile terminals, Android and IOS, the initial window size is relatively large and the default is 10, considering the disadvantages of TCP slow start;
2) In the case of ordinary text transmission, it is not very sensitive to the severity of packet loss (not to say that multimedia data streams are transmitted, but some text data is transmitted. This block is not particularly serious for the side effect of packet loss TCP);
3) We feel that TCP is used more at the application layer.

Regarding point "3)", here are the following three considerations.

The first consideration:

Basically, now the application program adopts HTP protocol or push method is basically TCP, we think TCP generally does not cause major problems.

Once TCP is discarded and UDP or QUIC protocol is used, there will be a big problem of unevenness, which can not be solved in a short time, so TCP is finally used.

The second consideration:

Which way our service uses to do LB on the basic layer, there are two choices at that time, one is traditional LVS, the other is HttpDNS (for HttpDNS, please refer to "Comprehensive Understanding of Mobile DNS Domain Name Hijacking and Other Miscellaneous Diseases: Principles" , Root Cause, HttpDNS Solution, etc. ").

In the end, we chose HttpDNS. First of all, we still need LB support across computer rooms. HttpDNS completely wins. Secondly, if you need to cross the network, LVS can't do it, and other deployment methods are needed. Furthermore, in terms of capacity expansion, LVS is slightly better. Finally, for the general LB algorithm, the LVS support is not good. It needs the LB algorithm based on the user ID, and the LB algorithm based on consistent hashing. It also needs the location information based on the geographic location. In these aspects, HttpDNS can perfectly win. , But LVS can't do it.

The third consideration:

What kind of method do we adopt when we make the saturation mechanism of TCP? How to determine the Ping packet method, the interval time, and how to determine the time details of the Ping packet?

At that time, I was more entangled in whether the client actively pinged or the server actively pinged?

It is better to support the client keep-alive mechanism, because the client may be awakened, but the client may not be able to send packets after entering the background.

Second: The front and back ends of the APP keep alive for different Ping packet intervals, because the backend itself is in a weak online state, and there is no need to send Ping packets frequently to determine the online state.

Therefore: the time interval of Ping packets in the background can be longer, and the time interval of the front end can be shorter.

Moreover: Ping exponentially increasing interval support is required, which is more life-saving in the event of a failure.

For example: once the server fails, if the client desperately pings, the server may be completely paralyzed. If there is an exponentially increasing Ping packet interval, the basic server can still slow down, which is more important in the event of a failure.

Finally: Whether Backoff is required for Ping packet retry, Ping packet re-sends Ping, if Bang packet is not received, you need to wait until Backoff sends Ping.

10. Dynamic Ping packet time interval algorithm

PS: In IM, this actually has a more professional name-"Intelligent Heartbeat Algorithm".

We also designed a dynamic Ping packet interval algorithm.

Because domestic network operators have a keep-alive mechanism for NIT equipment, which is basically more than 5 minutes at present. If you don't send packets in 5 minutes, your cache will be deleted. Basically, all operators are in more than 5 minutes, but mobile 4G hinders. Basically, a Ping packet can be sent within 4 to 10 minutes, and the cache in the network operator's equipment can be maintained, so that there is no problem, and the long connection is kept alive.

Increasing the Ping packet interval can reduce network traffic and further reduce the power consumption of the client. This benefit is still relatively large.

In the case of low-end Android devices, there are some DHCP lease issues. This problem is concentrated on the lower version of the Android terminal, and Android will not renew the expired IP.

It is also relatively simple to solve the problem. When the DHCP lease is halfway, it can be solved by renewing the lease with the DHCP server in time.

Due to space limitations, I will not expand here. If you are interested, you can read these materials:

"Why does the mobile terminal IM based on the TCP protocol still need a heartbeat keep-alive mechanism? 》
"An article to understand the network heartbeat packet mechanism in instant messaging applications: functions, principles, realization ideas, etc."
"WeChat team original sharing: Android version of WeChat background keep-alive actual sharing (network keep-alive)"
"Mobile IM Practice: Realizing the Smart Heartbeat Mechanism of Android WeChat"
"Mobile IM Practice: Analysis of the Heartbeat Strategy of WhatsApp, Line, and WeChat"
"A Discussion on the Design and Implementation of an IM Intelligent Heartbeat Algorithm on Android (with sample code)"
"Teach you how to use Netty to realize the heartbeat mechanism and disconnection reconnection mechanism of network communication programs"

11. Service architecture

11.1 Basic introduction
The service architecture is relatively simple, about four modules:

1) First is HttpDNS;
2) The other is the Connector access layer, which provides IP,
3) Then there is the Router, which is similar to the proxy forwarding message, selects the server of the access layer according to the IP, and finally pushes it to the user;
4) Finally, there is the authentication module Account. We are currently only Tantan APP, which is implemented in the user center.

11.2 Deployment
The deployment is equivalent to three modules:

1) One is Dispatcher;
2) One is Redis;
3) One is Cluser.

As shown in the figure below: When the client connects:

1) Need to get an agreement;
2) In the second step, get the ConnectorIP through HttpDNS;
3) After connecting through the IP company, the next step is to send an Auth message for authentication;
4) After the connection is successful, the Ping packet will be sent to keep alive;
5) Disconnect afterwards.

11.3 Message forwarding process
The message forwarding process is divided into two parts.

The first is the message upstream: the server initiates a message packet, accesses the service through the Connector, the client sends the message through the Connector, and then sends the message to the microservice through the Connector. If the microservice is not needed, it can be directly forwarded to the Vetor. In this case, the Connector is more like a Gateway.

For the downlink: the business side needs to request the Router, find the specific Connector, and deploy the message according to the Connector.

Each company has a microservice architecture, and the interaction between long connections and microservices is basically two pieces. One is when the message is upstream, it is more like a Gateway, the downstream is connected through the Router, and the message is sent through the Connector.

11.4 Some implementation details
The following are some details. We use GO language 1.13.4, gRPC for internal message transmission, and Http2 as the transmission protocol. We use ETCD to do LB internally to provide service registration and discovery services.

As shown in the figure below: Connector is the state, it is a state information from the user ID to the connection.

Let's look at the right side of the figure below: it actually has a relatively large MAP. In order to prevent the MAP lock competition from being too serious, the MAP is split into 2 to 56 sub-MAPs to achieve a high-read-write MAP in this way. For each MAP mapping from an ID to a connection state, each connection is a Go Ping, and the implementation details are 4KB for reading and writing. This has not been changed.

Let's take a look at Router: it is a stateless CommonGRPC service. It is relatively easy to expand. Now the state information is stored in Redis. Redis is about a group of one layer. The current peak value is 3000.

We have two states: one is Connector and the other is Router.

First of all, the connector status is the main one, and the router is the guarantee of consistent status.

There are two situations in this: if they are connected to the same Connector, the Connector needs to ensure that the sequence of copying to the Router is correct. If the sequence is inconsistent, the status of the Router and the Connector will be inconsistent. Through the unified Connector window to achieve message consistency, if cross Connector, through the Redis Lua script to achieve the Compare And Update method to ensure that only the state written by the connector can be updated by itself, if it is another connector, it cannot update others Confidence. We guarantee that both across Connectors and the same Connector can update the connection status in the Router in a consistent manner in sequence.

Dispatche is relatively simple: it is a pure Common Http API service, it provides Http API, the current delay is relatively low about 20 microseconds, 4 CPUs can support 100,000 concurrency.

At present, a high availability is achieved through a structure with no single point: the first is Http DNS and Router. These two are barrier-free services and only need to be guaranteed by LB. For the Connector, the Ordfrev of the connection layer is realized through the active drift of the Http DNS client. In this way, it is ensured that once a Connector has a problem, the client can immediately drift to the next Connector to realize automatic work transfer. There is no single point.

12. Performance optimization

12.1 Basic information
The follow-up optimization mainly has the following aspects:

1) Network optimization: This is done by pulling the client together. First, when the client needs to retransmit the packet, it sends three sniffing packets. In this way, a fast retransmission mechanism is implemented, and the fast retransmission is improved through this mechanism. proportion;
2) Heartbeat optimization: reduce the number of Ping packets through dynamic Ping packet interval time, this is still under development;
3) Prevent hijacking: It is to avoid the operation of domain name hijacking by using the IP direct connection method through the client;
4) DNS optimization: It requests the client's HttpDNS by returning multiple IPs each time through HttpDNS.
12.2 Network optimization
For the access layer, in fact, the number of connections of the connector is relatively large, and the load of the connector is also relatively high.

We have made a relatively large optimization for the Connector. First of all, the earliest GC time of the Connector has reached 4 or 5 milliseconds, which is horrible.

Let's take a look at the picture below (on the picture) after optimization, which is about 100 microseconds on average, which is relatively good. The second picture (below the picture) is the result of the second optimization, which is about 29 microseconds, and the third picture is about 20 microseconds.

12.3 Message delay
Looking at the message delay, Tantan has relatively high requirements for im message delay, and pays special attention to the user experience.

This block is about 200ms at the beginning, if for an operation, 200ms is still more serious.

The state after the first optimization (below-top) is about 1 to a few milliseconds, and after the second optimization (below-bottom), it now drops to the lowest point, which is almost 100 microseconds, which is closer to the general Net operation time dimension. .

12.4 Connector optimization process
The optimization process is like this:

1) First, the Info log on the critical path is needed, and Access Log is realized through sampling. Info log is a relatively heavy operation at the access layer;
2) The second cache object through Sync.Poll;
3) Third, distribute online as much as possible through the Escape Analysis object.
Later, a lossless release version of Connector was realized: this one is more valuable. There are many releases of long-lived connections that have just been launched, and each release will be felt to users. In this way, users will be as insensitive as possible.

The Graceful Shutdown method of the Connector is implemented, and the connection is optimized in this way.

First: Go online and offline the machine on HttpDNS, and slowly disconnect the user connection after going offline until the number of connections is less than a certain threshold. After that, restart the service and release the binary version.

Finally: HttpDNS was launched on the machine. In this way, it took a long time for users to publish the version. It took a long time to measure how many connections are disconnected per second, and what is the final threshold.

Some data follows: GC is also part of it just now, and the number of connections currently belongs to the more critical data. First look at the number of connections. The number of single-machine connections is relatively small, and I dare not let it go too far. The maximum number of single-machine connections is 150,000, which is about 100 microseconds.

The number of Goroutines is the same as the number of connections, almost 150,000:

Take a look at the memory usage status. The figure below (top) shows the total memory of GO, which is about 2:3. The remaining one-fifth is unoccupied, and the total memory is 7.3G.

The figure below shows the GC status. The GC is relatively healthy. The red line is the number of active memory per GC. The red line is much higher than the green line.

Seeing that the current status of GC is about 20 microseconds, I feel that the current time is comparable to the official time of GO, and we feel that GC is currently optimized.

12.5 Follow-up optimization
Finally, the follow-up planning needs to be optimized.

First of all: For the system, we still need to optimize the Connector layer more, reduce the allocation of memory more, try to allocate the memory to the heap instead of the station, in this way to reduce the GC pressure, we see that GO is non-Generational Collection GE, The more memory in the heap, the more memory will be scanned, so it is not a linear growth.

Second: Use Sync Pool for short-term memory allocation internally, such as Context or temporary Dbyle.

The protocol also needs to be optimized: the WebSocket protocol is currently used, and some functional flags will be added later to transmit some important information to the server. For example, for some retransmission flags, if the client adds a retransmission flag, we can first check whether the packet is a retransmission packet, and if it is a retransmission packet, we will judge whether the packet is a duplicate or whether it has been sent before. If you have sent it, you don't need to unpack it, so you can do a lot of server-side operations.

In addition: you can go to remove the current Mask mechanism of Websocket, because the Mask mechanism prevents the web-side repackaging operation, but it is basically the client-side packet transmission, so the Mask mechanism is not needed.

Business: At present, more things need to be done behind the planning. We think that long connection is a very good place to count the distribution of some clients because it is an access layer. For example, the distribution of Android and IOS on the client side.

Further: You can make statistics of user portraits, male and female, what is their age, and what is their geographic location. Probably these, thank you!

13. Reply to hot questions

  • Question: I just said that the connection layer dialogue restarts. In the indirect process, those users who are disconnected will float to others. Is this done?

Zhang Kaihong: This is currently the case, and the client does automatic drift.

  • Question: Now it is 10 million daily activities. If the server pushes 1 million to the client, how is this scenario done?

Zhang Kaihong: At present, we do not have such a large amount of news pushes. Sometimes we will send some business-related pushes. Currently, we have implemented a current limit, which is achieved through the client-side current limit, which is about three or four thousand.

  • Question: If you do the back-end, it means that there will be security risks. Attackers will continue to establish connections, making it difficult to defend. Will there be this problem? Because of malicious attacks, it is enough to establish a connection if it is attacked, and no authentication mechanism is required.

Zhang Kaihong: I understand what you mean. This is not just a long connection, but short connections also have this problem. The client has been forging access results, and the traffic is still relatively large. This is achieved by firewalls and IP-layer firewalls.

  • Question: The persistent connection server is hung on the outermost side. Is there a layer in the middle?

Zhang Kaihong: At present, the following layers are directly exposed to the external network layer, with a layer of IP anti-DNSFre firewall in front. There is no other network equipment besides this.

  • Question: Based on what kind of considerations, there is no additional layer in the middle, because there is still a layer added before.

Zhang Kaihong: Currently there is no plan. An LS layer will be added in front of the Websofte access layer to facilitate capacity expansion. This benefit is not particularly large, so there is no plan now.

  • Question: What does the three sniffing of disconnect and retransmission just mentioned mean?

Zhang Kaihong: We want to trigger fast retransmission more so that the retransmission interval for TCP is shorter. The server judges whether to retransmit quickly based on three cyclic packets. We will send three cyclic packets to avoid one RTO retransmission. Turn on.

  • Question: Did Tantan first use a third-party Android server?

Zhang Kaihong: Yes, it was pushed by Aurora at first.

  • Question: From a third-party Android server to self-research.

Zhang Kaihong: If there are some malfunctions in Aurora, it will still have a great impact on us. Aurora’s failure frequency was quite high before, and we wonder if we can do the service ourselves. The second point is that Aurora itself can provide a judgment whether a user is online, but its judgment requires a channel, and the delay is relatively high, and its judgment is that the connection reduces the delay.

  • Question: For example, a new user connects online and some users send him messages. How did he get the first-line messages?

Zhang Kaihong: We guarantee through the business side that an ID number will be stored for unsent messages. When the user reconnects, the business side will pull it again.

14. Reference materials

[1] Protocol selection of mobile IM/push system: UDP or TCP?
[2] The 5G era has arrived, and TCP/IP is so old, can you still eat?
[3] Why does the mobile terminal IM based on the TCP protocol still need a heartbeat keep-alive mechanism?
[4] An article to understand the network heartbeat packet mechanism in instant messaging applications: functions, principles, implementation ideas, etc.
[5] WeChat team original sharing: Android version of WeChat background keep-alive actual sharing (network keep-alive)
[6] Mobile IM Practice: Realize the smart heartbeat mechanism of Android version of WeChat
[7] Towards a higher level: the network foundation that good Android programmers must know and understand
[8] Comprehensive understanding of miscellaneous problems such as mobile DNS domain name hijacking: principles, root causes, HttpDNS solutions, etc.
[9] Technical literacy: a new generation of UDP-based low-latency network transport layer protocol-QUIC detailed explanation
[10] One entry is enough for novices: develop mobile IM from scratch
[11] Long-connection gateway technology topic (2): Knowing the practice of high-performance long-connection gateway technology with tens of millions of concurrent connections
[12] Long-connection gateway technology topic (3): The technological evolution of mobile access layer gateways
[13] Long connection gateway technology topic (5): Himalaya self-developed billion-level API gateway technology practice
[14] A set of IM architecture technical dry goods for hundreds of millions of users (Part 1): overall architecture, service split, etc.
[15] A set of IM architecture technical dry goods for hundreds of millions of users (Part 2): reliability, orderliness, weak network optimization, etc.
[16] From novice to expert: How to design a distributed IM system with billions of messages

This article has been simultaneously published on the official account of "Instant Messaging Technology Circle".
The synchronous publishing link is: http://www.52im.net/thread-3780-1-1.html


JackJiang
1.6k 声望808 粉丝

专注即时通讯(IM/推送)技术学习和研究。