Introduction to Getty maintenance team does not pursue meaningless benchmark data, does not make meaningless technical optimizations, and only makes improvements according to the needs of the production environment. As long as the maintenance team is there, Getty's stability and performance will become better and better.

I have been engaged in the research and development of Internet infrastructure systems for more than ten years, and many of my friends, including myself, are wheel parties.

When I was working in a large factory in 2011, many of my colleagues who used C language for development had their own private SDK library, especially the network communication library. When I first integrated into this environment, I felt that I couldn't write an asynchronous network communication library based on the epoll/iocp/kqueue interface, which would dwarf me in front of my colleagues. Now that I think of it, many colleagues were bold enough to put their encapsulated communication library directly on the test production environment. It is said that at that time there were as many as 197 RPC communication libraries in the production environment.

I spent two years off on weekends to create such a private C language SDK library: C language implementation of most C++ STL containers, timers, TCP/UDP communication library, log output library with output speed up to 150MiB/s, based on The various locks implemented by CAS, the multi-producer, multi-consumer lock-free queue that circumvents the ABA problem, and so on. I didn't understand PHP at the time, but I could make a framework similar to Swoole with a little encapsulation. If you insist on writing it down, it may be comparable to the ACL library of teacher Zheng Shuxin, an old friend.

I started to get in touch with the Go language in 2014, and after a period of learning, I found that it has the same characteristics as the C language: too few basic libraries-you can make wheels again. I clearly remember the first wheel that I made, the doubly linked list xorlist with only one pointer per element [see reference 1].

When I was working on an instant messaging project in June 2016, the original gateway was a Java project based on netty. Later, when refactoring in Go language, the implementation of each interface of its TCP network library directly borrowed from netty. When websocket support was added to it in August of the same year, I felt that the onopen/onclose/onmessage network interface provided by websocket was extremely convenient, so I changed its network interface to OnOpen/OnClose/OnMessage/OnClose and put all the code on github. And propaganda in a small area [see reference 2].

Getty layered design

1.png

Getty strictly follows the principle of layered design. It is mainly divided into data interaction layer, business control layer, and network layer. At the same time, it also provides a very easy-to-expandable monitoring interface, which is actually an externally exposed network library interface.

2.png

1. Data interaction layer

The network framework provided by many people defines the network protocol format by themselves, at least the network packet header format is defined, and only upper users are allowed to expand below this header, which limits its scope of use. Getty does not make any assumptions about the upper-layer protocol format, but is defined by the user, so it provides a data interaction layer upwards.

As far as it is concerned, the data interaction layer is actually very single, specializing in the data interaction between the client and the server, which is the carrier of the serialization protocol. It is also very simple to use, as long as it implements the ReadWriter interface.

Getty defines the ReadWriter interface, and the specific serialization/deserialization logic is handed over to the user for manual implementation. When one end of the network connection reads the byte stream sent by the peer through net.Conn, it will call the Read method for deserialization. The Writer interface is called in the network sending function. Before a network packet is sent, Getty first calls the Write method to serialize the sent data into a byte stream, and then writes it to net.Conn.

3.png

The definition code of the ReadWriter interface is as above. The reason why the Read interface has three return values is to handle the TCP flow sticky packet situation:

- if the network stream error occurs, such as protocol format error, return (nil, 0, error) - If the read stream is very short, its head (header) can not be parsed Returns (nil, 0, nil) - If the read stream is very short, it may be parsed header (header) but can not resolve the entire package (package), returns (nil, pkgLen, nil) - if it can resolve the A complete package (package), then return (pkg, 0, error)

2. Business control layer

The business control layer is the essence of Getty's design and consists of Connection and Session.

- Connection*

Responsible for the management of the established Socket connection, including: connection status management, connection timeout control, connection reconnection control, and related processing of data packets, such as data packet compression, data packet splicing and reassembly, etc.

Session**

Responsible for the management of a connection establishment of the client, recording the status data of this connection, managing the creation and closing of the Connection, and controlling the data transmission/interface processing.

2.1 Session 

Session can be said to be the core interface in Getty. Each Session represents a session connection.

- down

Session has made a complete package for Go's built-in network library, including data stream reading and writing to net.Conn, timeout mechanism, etc.

- up

Session provides a business-cutting interface. Users only need to implement EventListener to integrate Getty into their business logic.

The current implementation of the Session interface is only the session structure. As an interface, Session only provides external visibility and a mechanism to follow the programming interface. Afterwards, when we talk about Session, we are actually talking about the session structure.

 2.2 Connection 

Connection abstractly encapsulates Go's built-in network library according to different communication modes. Connection has three implementations:

- gettyTCPConn : The bottom layer is *net.TCPConn

- gettyUDPConn : the bottom is * net.UDPConn

- gettyWSConn : the use of third-party libraries to achieve the underlying

2.3 Network API interface EventListener

As mentioned at the beginning of this article, the name of the Getty network API interface is borrowed from the WebSocket network API interface. Hao Hongfan, one of the maintainers of Getty, likes to call it the "monitoring interface". The reason is: The most troublesome part of network programming is that you don’t know how to troubleshoot when problems occur. Through these interfaces, you can know the status of each network connection at each stage. .

4.png

"OnOpen": is provided to users when the connection is established. If the total number of current connections exceeds the number of connections set by the user, a non-nil error can be returned, and Getty will close the connection in the initial stage. "OnError": used for monitoring when the connection is abnormal. Getty closes the connection after executing this interface. "OnClose": used to monitor when the connection is closed. Getty closes the connection after executing this interface. "OnMessage": When Getty calls the Reader interface and successfully parses a package from the TCP stream/UDP/WebSocket network, the data packet is handed over to the user for processing through this interface. "OnCron": timing interface, the user can execute some timing logic such as heartbeat detection in this interface function.

The core of these five interfaces is OnMessage. This method has an interface{} type parameter for receiving data from the peer.

You may have a doubt. The bottom layer of the network connection is binary. The protocol layer we use generally reads and writes the connection in a byte stream, so why use interface{} here?

This is Getty in order to allow us to focus on writing business logic, and extract the logic of serialization and deserialization outside of EventListener, which is the Reader/Writer interface mentioned earlier. During the operation of the session, it will first start from net.Conn. Read the byte stream, deserialize it through the Reader interface, and pass the deserialized result to the OnMessage method.

If you want to connect the corresponding metrics to Prometheus, it is easy to add a collection of various metrics to these EventListener interfaces.

Getty network data flow

The following figure is a class diagram of Getty's core structure, including the design of the entire Getty framework.

5.png

| Description: The gray part in the figure is the Go built-in library

Let's take TCP as an example to introduce how to use Getty and the role of each interface or object in this class diagram. Among them, server/client is an encapsulated structure provided to users. The logic of client is consistent with server to many extents, so this chapter only talks about server.

6.png

Getty server startup code flow chart is as above. In Getty, the startup process of the server service only requires two lines of code:

7.png

The first line is obviously a process of creating a server, options is a func (*ServerOptions) function, which is used to add some additional function settings to the server, such as enabling ssl, using the task queue to submit tasks to execute tasks, etc.

The second line of server.RunEventLoop(NewHelloServerSession) is to start the server and also the entrance of the entire server service. Its function is to monitor a certain port (which port to monitor can be specified by options) and process the data sent by the client. The RunEventLoop method needs to provide a parameter, NewSessionCallback, whose type is defined as follows:

8.png

This is a callback function that will be called after the connection with the client is successfully established. It is generally provided to users to set network parameters, such as setting the keepAlive parameter of the connection, buffer size, maximum message length, read/write timeout, etc. But the most important thing is that users need to use this function to set up the Reader, Writer and EventListener to be used for the session.

So far, the processing flow of the server in Getty is roughly as follows:

9.png

For the use of these interfaces, in addition to the code examples provided by getty itself, another excellent example is seata-golang. If you are interested, please refer to the article "Distributed Transaction Framework seata-golang Communication Model" [Reference 6].

optimization

A rule of thumb for software development is: "Make it work, make it right, make it fast" , premature optimization is the root of all evil.

An early example is that Joe Armstrong, the inventor of erlang, spent a lot of energy in his early years to stay up late and work overtime to improve the performance of erlang. One of the consequences is that in the later period he found that some of the early optimization work was useless, and the other was premature. Optimization damaged Joe's health, causing him to die at the age of 68 in 2019.

Extending the time unit to five or even ten years, you may find that some optimization work done in the early stage will become a burden for maintenance work in the later stage. In 2006, many experts still recommended that you only use Java for ERP development, and do not use Java in Internet back-end programming. The reason is that the performance of single-core CPU machines at the time is indeed not good compared with C/C++. Of course, the reason can be blamed. Due to the nature of its interpreted language and JVM GC, after 2010, almost no one has heard complaints about its performance.

At a meal in 2014, I met Mr. Zhou Aimin, the former architect of Alipay. Mr. Zhou also joked that if Alipay switches the main business programming language from Java to C++, about 2/3 of the number of servers can be saved.

By analogy, as a new language much younger than Java, the Go language defines a programming paradigm, and programming efficiency is its primary consideration. As for its program performance, especially network IO performance, this type of problem can be handed over to time, five years later Many of the current complaints may not be a problem. If the program really encounters network IO performance bottlenecks and the machine budget is tight, you can consider switching to a lower-level language such as C/C++/Rust.

In 2019, MOSN's underlying network library used the Go language native network library. Each TCP network connection used two goroutines to handle network transmission and reception respectively. Of course, after optimization, a single TCP connection was achieved so that a single TCP only uses one goroutine. The epoll system call is not used for optimization.

Give another example.

ByteDance has been publishing articles in Zhihu since 2020 to promote the excellent performance of its Go language network framework kitex [see reference 3], saying that it is based on the native epoll and "the performance has far exceeded the official net library". At that time, it had no open source code, and everyone had no choice but to believe it. At the beginning of 2021, Toutiao began to promote it again [see reference 4], claiming that "Test data shows that the current version (2020.12) compared to the last time (2020.05) has a throughput of ↑30% and a delay of AVG ↓25 %, TP99 ↓67%, the performance has far surpassed the official net library". Then finally open source the code. In early August, the bird's nest bosses were tested, and the test conclusions were given in the article "2021 Go Ecosphere rpc Framework Benchmark" (link see reference 5).

said so much, take back the topic and summarize the sentence: 1615ac297948be Getty only considers the use of Go language native network interface, if it encounters network performance bottlenecks, it will only look for optimization breakthrough points at its own level.

Getty undergoes major upgrades every year. This article shows several major upgrades of Getty in recent years.

 1、Goroutine Pool

The initial version of Getty enables two goroutines for a network connection: one goroutine receives the network byte stream, calls the Reader interface to disassemble the network package (package), calls the EventListener.OnMessage() interface for logical processing; the other goroutine is responsible for sending Network byte stream, call EventListener.OnCron() to execute timing logic.

Later, out of the need to improve network throughput, Getty made a major optimization: separating the logic processing step logic from the first goroutine task, and adding a Goroutine Pool (hereinafter referred to as Gr pool) to handle network logic.

That is, network byte stream reception, logic processing and network byte stream sending have separate goroutine processing.

10.png

Gr Pool members have task queues [the number of which is M], Gr arrays [the number of which is N] and tasks [or message]. According to the number of N changes, their types are divided into scalable Gr pool and fixed size Gr pool. The benefit of scalable Gr Pool is that it can increase or decrease N as the number of tasks changes to save CPU and memory resources.

1.1 Fixed size Gr Pool

According to the ratio of M to N, the fixed size Gr Pool is divided into three types: 1:1, 1:N, and M:N.

1: N type Gr Pool is the easiest to implement. I have implemented this type of Gr Pool in the project kafka-connect-elasticsearch in 2017: read data from kafka as a consumer and put it into the message queue, and then each worker gr from this queue Take out the task for consumption processing.

The Gr pool of this model only creates one chan for the entire pool, and all Gr reads this chan. The disadvantage is that the queue read-write model is one write and multiple read, because of the inefficiency of the go channel [using a mutex lock as a whole] The competition is fierce, and of course the order of network packet processing cannot be guaranteed.

The Gr pool model of the initial version of Getty is 1:1. Each Gr has its own chan. Its read and write model is one write and one read. Its advantage is that it can ensure the order of network packet processing. For example, when reading Kafka messages, follow The hash value of the key of the kafka message is delivered to a task queue in the remainder mode [hash(message key)% N], then the messages of the same key can be guaranteed to be processed in an orderly manner. But the flaw of this model: each task needs time to process. This solution will cause a task jam in a Gr's chan. Even if other Gr is idle, there is no way to process it [task processing "starvation"].

A further improvement of the 1:1 model: each Gr has a chan. If Gr finds that his chan is not requested, he will find another chan, and the sender will also try to send it to the fast-consuming coroutine. This solution is similar to the goroutine queue used by the MPG scheduling algorithm inside the go runtime, but its algorithm and implementation will be too complicated.

Getty later implemented the Gr pool of the M:N model version. Each task queue is consumed by N/M Gr. The advantage of this model is that it takes into account processing efficiency and lock pressure balance, and can achieve overall task processing balance. Task Distribution adopts RoundRobin method.

11.png

The overall realization is shown in the figure above. For specific code implementation, please refer to the TaskPool implementation in gr pool [reference 7] connection.

1.2 Unlimited Gr Pool

A Gr pool that uses a fixed amount of resources cannot guarantee throughput and RT when the request volume increases. In some scenarios, users want to use up all resources as much as possible to ensure throughput and RT.

Later, we learned from the "Goroutine pool" in the article "A Million WebSockets and Go" [Reference 8] to implement a gr pool with unlimited capacity.

For specific code implementation, please refer to the taskPoolSimple implementation in gr pool [reference 7] connection.

1.3 Network packet processing sequence

The advantage of a fixed-size gr pool is that it limits the use of resources such as the machine's CPU/MEMORY by the logical processing flow, while the unlimited Gr Pool maintains flexibility but may exhaust the machine's resources and cause the container to be killed by the kernel. But no matter what form of gr pool is used, getty cannot guarantee the processing order of network packets.

For example, if the Getty server receives two network packets A and B from the same client, the Gr pool model may cause the server to process the B package first and then process the A package. Similarly, the client may first receive the response of the B package from the server, and then receive the response of the A package.

If each request of the client is independent and there is no sequence relationship, Getty with the Gr pool feature does not consider the sequence relationship. If the upper-level user pays attention to the order of the processing of A and B requests, they can merge the two requests A and B into one request, or turn off the gr pool feature.

2、Lazy Reconnect

The session in Getty represents a network connection, and the client is actually a network connection pool that maintains a certain number of connection sessions. Of course, this number is set by the user. In the initial version of Getty client [version before 2018], each client individually starts a goroutine to poll to detect the number of sessions in its connection pool, and if it does not reach the number of connections set by the user, it initiates a new connection to the server.

When the client is disconnected from the server, the server may be offline, it may exit unexpectedly, or it may be suspended. If the upper-level user determines that the peer server does not exist [for example, after receiving the server offline notification from the registry], call the client.Close() interface to close the connection pool. If the upper-level user does not call this interface to close the connection pool, the client thinks that the peer address is still valid, and will continue to try to initiate a reconnection to maintain the connection pool.

In summary, from closing an old session to creating a new session, the reconnection processing flow of the initial version of getty client is:

  • The old session closes network to receive goroutine;
  • The old session network sending goroutine detects that network receiving goroutine exits and then terminates the network sending. After resource recovery, the current session is set to be invalid;
  • The client's polling goroutine detects an invalid session and deletes it from the session connection pool;
  • When the client polling goroutine detects that the number of valid sessions is less than the number set by the upper user of getty and the upper user of getty has not client.Close() interface, it calls the connection interface to initiate a new connection.

The above method of continuously checking the validity of each session in the client's session pool through regular polling can be called active connection. The disadvantage of active connection is obviously that each client needs to enable a goroutine separately. Of course, one of its further optimization methods is to start a global goroutine, polling the session pool of all clients regularly, and it is not necessary to start a goroutine separately for each client. But I have been thinking about a question since 2016: Can you change the way of session pool maintenance, remove the timed polling mechanism, and do not use any goroutine to maintain the session pool of each client?

In May 2018, when I was walking around after lunch, I reorganized the reconnection logic of the getty client and suddenly thought of another method. In step 2, the network can send goroutines for "waste utilization". Add another logic after the logic step in which the goroutine marks the current session as invalid:

  • If the maintainer of the current session is a client [because the user of the session may also be a server];
  • And if the number of sessions in its current session pool is less than the session number set by the upper user;
  • And if the upper user has not client.Close() [that is, the current session pool is valid, or the peer server is valid]
  • If the above three conditions are met, the network sends a goroutine to perform connection reconnection;
  • After the new network connection session is successfully established and added to the client's session pool, the network sends the goroutine mission and exits directly.

I call this re-connection mode called lazy reconnect , network to send goroutine in the final stages of their life cycle it should be called network reconnection goroutine. With lazy reconnect , the logic of steps 3 and 4 of the above reconnection is merged into step 2. Of course, there is no need for the client to start an additional goroutine to maintain its session pool through regular polling.

12.png

The overall flow chart of lazy reconnect If you are interested in the relevant code flow, please move to the link given in "Reference 13", it is easy to analyze it yourself.

3. Timer

After the introduction of Gr pool, a network connection uses at least three goroutines:

  • A goroutine receives the network byte stream and calls the Reader interface to disassemble the network package (package)
  • The second goroutine calls the EventListener.OnMessage() interface for logical processing
  • The third goroutine is responsible for sending network byte streams, calling EventListener.OnCron() execute timing logic and lazy reconnect

This model can still run stably with fewer connections. But when the cluster scale reaches a certain scale, for example, when the number of connections per server reaches more than 1k, the network connection alone uses at least 3k goroutines, which is a huge waste of CPU computing resources and memory resources. Among the above three goroutines, the first goroutine cannot be disassembled, the second goroutine is actually part of the gr pool, and the object that can be optimized is the task of the third goroutine.

At the end of 2020, the Getty maintenance team first put the network byte stream task into the second goroutine: after processing the logical task, it will immediately send it to the synchronous network. After the improvement here, the third goroutine only has one EventListener.OnCron() timed processing task. This timing logic can actually be thrown to the upper-level caller of Getty for processing, but for the convenience of users and backward compatibility, we use another optimization idea: introducing a time wheel to manage timing heartbeat detection.

13.png

In September 2017, I implemented a Go language timer time wheel library timer wheel (see reference 10 for link), which is compatible with all native interfaces of Go timer. The advantage is that all time tasks are executed in a goroutine. After it was introduced to getty in December 2020, all the EventListener.OnCron() timing processing tasks of getty are handled by the timer wheel, and the third goroutine can disappear perfectly [follow-up: two months later, it was discovered that the timer library was pulled by another star The "learning" of the rpc project is gone ^+^].

At this time, the third goroutine is left with the last task: lazy reconnect . When the third goroutine does not exist, this task can be put into the first goroutine: lazy reconnect in the last step before the network byte stream is received.

The optimized and improved network connection only uses two goroutines at most:

  • A goroutine receives the network byte stream, calls the Reader interface to disassemble the network package, lazy reconnect
  • The second goroutine calls the EventListener.OnMessage() interface for logical processing and sending network byte streams

The second goroutine comes from gr pool. Considering that the goroutines in the gr pool are reusable public resources, a single connection actually only occupies the first goroutine alone.

4. Getty pressure test

Student Hao Hongfan of the Getty maintenance team used the benchmark program of rpcx to implement the getty benchmark [Reference 11], and performed an overvoltage test on the optimized v1.4.3 version.

"Pressure Test Environment" :

14.png

"pressure test result":

15.png

The pressure test results are as above, the server TPS number reaches 12556, and the network throughput can reach 12556 * 915 B/s ≈ 11219 KiB/s ≈ 11 MiB/s.

16.png

17.png

As shown in the above figure, it can be seen that the CPU/MEMORY resources of the service order before and after the network stress test have changed. The getty server only uses 6% of the CPU, and the memory usage is only over 100MiB, and the resources are returned to the system soon after the test pressure is removed. .

The goroutine usage of the getty server during the test is shown in the figure below: a single TCP connection uses only one goroutine.

18.png

Please refer to the benmark result for the entire test result and related codes (see [Reference 12] for the link). Of course, this stress test did not press out the extreme performance of Getty, but it has been able to meet the needs of Ali's main scenarios.

Development timeline

Since I personally wrote Getty in 2016, and now there is a dedicated open source team to maintain Getty, Getty has not been easy all the way.

Sorting out its timeline, its main development time nodes are as follows:

- 2016 Nian 6 Yuefen developed the first production version available that supports TCP / websocket two communication protocols, posted on October gocn https://gocn.vip/topics/8229 promotion;

- In September 2017, implemented a Go language timer wheel library timer wheel_ https://github.com/AlexStocks/goext/blob/master/time/time.go _

- added UDP communication support on it in March 2018;

- May 2018 support RPC based on protobuf and json;

- 2018 to join the August registration-based services and etcd zookeeper and discovery, named micro;

- 2019 years. 5 bottom tcp communication dated getty implementations are split out independently moved github.com/dubbogo, after moving github.com/apache/dubbo-getty;

2019, the Getty RPC package was moved in by two Ctrip students [ https://github.com/apache/dubbo-go/tree/master/protocol/dubbo ]( https://github .com/apache/dubbo-go/tree/master/protocol/dubbo), constructed dubbogo's RPC layer based on the Hessian2 protocol;

- May 2019, join the fixed-size goroutine pool;

informed him that he implemented seata-golang based on Getty;

- 2020 November transmits to the network into logic gr pool merge processing;

- May 2021, complete timer optimization;

Finally, as mentioned in the beginning of the third section optimization, the Getty maintenance team does not pursue meaningless benchmark data, does not make meaningless technical optimizations, and only improves itself according to the needs of the production environment. As long as the maintenance team is there, Getty's stability and performance will become better and better.

If you are interested in getty, welcome to join the search Dingding group number 23331795 Dingding group to communicate with the community.

Author introduction|

_ "Yuyu" _ GitHub account AlexStocks dubbogo community leader. A programmer with 11 years of front-line work experience in server-side infrastructure and middleware research and development. He has successively participated in and improved well-known projects such as Redis/Pika/Pika-Port/etcd/Muduo/Dubbo/dubbo-go/Sentinel-golang/Seata-golang, and is currently engaged in cloud native work in the Trusted Native Technology Department of Ant Group.

_ "Hao Hongfan" _ GitHub account georgehao Apache Dubbo Committer, member of getty maintenance team. At present, he is proficient in the underlying technology of Go runtime in the big data platform of Jingdong Retail Department.

_ "Dong Jianhui" _ GitHub account Mulavar Graduated from the famous VLIS laboratory of Zhejiang University in May 2021. He used to work as an intern in the payment business fund service technical department of the Ant Group Hangzhou headquarters, and currently works in the calculation engine group of the Meituan Big Data Beijing headquarters, engaged in flink-related development work.

|Reference

  • 1

https://github.com/alexstocks/goext/blob/master/container/xorlist/xorlist.go

  • 2 A simple network framework getty compatible with tcp and websocket

https://gocn.vip/topics/8229

  • Practice of 3 byte beating on Go network library

https://juejin.cn/post/6844904153173458958

  • 4 byte beating Go RPC framework KiteX performance optimization practice

https://mp.weixin.qq.com/s/Xoaoiotl7ZQoG2iXo9\_DWg

  • 5 2021 Go Ecosphere rpc framework benchmark

https://colobu.com/2021/08/01/benchmark-of-rpc-frameworks/

  • 6 Distributed transaction framework seata-golang communication model

https://mp.weixin.qq.com/s/7xoshM4lzqHX9WqANbHAJQ

  • 7 gr pool

https://github.com/dubbogo/gost/blob/master/sync/task\_pool.go

  • 8 A Million WebSockets and Go

https://www.freecodecamp.org/news/million-websockets-and-go-cc58418460bb/

  • 9 task pool

https://github.com/dubbogo/gost/blob/master/sync/base\_worker\_pool.go

  • 10 timer wheel

https://github.com/dubbogo/gost/blob/master/time/timer.go

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论