Introduction to TCP kernel parameter optimization for 16189eb482d420 Linux system

雀翎作者.png

In daily operation and maintenance work, we will encounter many TCP-related problems. There are many articles on the Internet describing which TCP kernel parameters need to be optimized, but they do not explain in detail the basis for optimization and what are the applicable scenarios. If we do not understand the actual parameters The effect of copying the configuration on the Internet to the production environment is likely to be counterproductive. This article explains the relevant TCP kernel parameters involved in the three stages of connection establishment, data transmission, and disconnection and gives optimization suggestions.

1.jpg Figure 1: TCP connection state transition diagram

1. Establishing the connection phase

  • net.ipv4.tcp\_syn\_retries
    Control the number of retransmissions of syn when the client sends syn in the first step of the three-way handshake and cannot receive a response from the server. If it is an internal network environment, there are few intermediate links, the network is stable, and the server does not respond, it is likely that there is a problem with the server application , Retransmission multiple times is of little significance, and it will increase the pressure on the server. You can reduce the number of retransmissions so that the client can try to connect to other servers as soon as possible.
  • net.ipv4.tcp\_syncookies
    Enable SYN Cookies, which is enabled by default. It is recommended to keep the default value to improve the protection against SYN Flood attacks.
  • net.ipv4.tcp\_synack\_retries
    The second step of controlling the three-way handshake is the number of retransmissions of syn+ack when the server sends syn+ack without a response from the client. If it is an internal network environment, there are few intermediate links and the network is stable. If the client does not respond, it is likely that the client does not respond. If something goes wrong, it doesn't make much sense to retransmit multiple times. You can reduce the number of retransmissions.
  • net.ipv4.tcp\_max\_syn\_backlog
    Control the size of the semi-connection queue. The so-called semi-connection refers to the connection that has not completed the TCP three-way handshake. After the server receives the SYN packet from the client, it will put the connection in the semi-connection queue, and then send SYN+ACK to the client. In order to cope with the surge in the number of new connections, it is recommended to increase the size, and the semi-connection queue overflows Observation method: netstat -s | grep "SYNs to LISTEN"
  • net.core.somaxconn
    Full connection queue = min (somaxconn, backlog), the so-called full connection means that the server has received the ACK of the third step of the client's three-way handshake, and then it will put this connection in the full connection queue. The connection also needs to be taken away by the accept() system call before the server application can start processing the client's request. It is recommended to increase it appropriately. The full connection queue overflow observation method: netstat -s | grep "listen queue"
  • net.ipv4.tcp\_abort\_on\_overflow
    When the full connection queue is full, new connections will be discarded. When the server discards a new connection, the default behavior is to directly discard it without notifying the client. Sometimes it needs to send a reset to notify the client so that the client will not retry again. As for whether it is necessary to send a reset to the client, yes Controlled by the tcp\_abort\_on\_overflow parameter, the default is 0, that is, no reset is sent to the client. If there is no special requirement, it is recommended to keep the default value.

2. Data transmission stage

  • net.ipv4.tcp\_wmem
    The size of the tcp sending buffer contains three values of min, default, and max. The kernel will control the sending buffer to be dynamically adjusted between min and max. It can be appropriately adjusted according to the actual business scenario and server configuration. If the SO\ of the socket is set _SNDBUF, the dynamic adjustment function is invalid, it is generally not recommended to set.
  • net.core.wmem\_max
    The maximum value of the socket sending buffer. The value of net.core.wmem\_max needs to be set to be greater than or equal to the max value of net.ipv4.tcp\_wmem.
  • net.ipv4.tcp\_mem
    The maximum memory that can be consumed by all tcp connections in the system has three values. When the total TCP memory is less than the first value, the kernel does not need to be automatically adjusted. When it is between the first and second values, the kernel starts to adjust the buffer When the size of is greater than the third value, the kernel no longer allocates new memory for TCP. At this time, a new connection cannot be established. It should be noted that the unit of the three values is memory page, which is 4KB.
  • net.ipv4.tcp\_rmem
    The size of the tcp receiving buffer contains three values: min, default, and max. The kernel will control the receiving buffer to dynamically adjust between min and max. It can be appropriately adjusted according to the actual business scenario and server configuration. If the socket SO\ _RECVBUF or net.ipv4.tcp\_moderate\_rcvbuf is closed, and the dynamic adjustment function is invalid.
  • net.core.rmem\_max
    The maximum value of socket receiving buffer, the value of net.core.rmem\_max needs to be set to be greater than or equal to the max value of net.ipv4.tcp\_rmem.
  • net.ipv4.tcp\_moderate\_rcvbuf
    The receive buffer dynamic adjustment function is enabled by default, and it is recommended to keep the default configuration.
  • net.ipv4.tcp\_window\_scaling
    Expand the sliding window. In the tcp header, the window field is only 2 bytes, and can only reach the 16th power of 2, which is a window of 65535 bytes. Turning on this switch can expand the window size. It is turned on by default. It is recommended to keep the default configuration. .
  • net.ipv4.tcp\_keepalive\_probes
    The number of retries before notifying the application after the keepalive detection fails is recommended to be adjusted appropriately.
  • net.ipv4.tcp\_keepalive\_intvl
    It is recommended to adjust the interval between keepalive detection packets appropriately.
  • net.ipv4.tcp\_keepalive\_time
    It is recommended to adjust the interval from the last data packet to the keepalive detection packet appropriately.
  • net.ipv4.tcp\_available\_congestion\_control
    Check the congestion control algorithms supported by the kernel.
  • net.ipv4.tcp\_congestion\_control
    Configure the congestion control algorithm. The default is cubic, and the kernel supports BBR after version 4.9. It is recommended to configure BBR under weak network conditions.

3. Disconnect phase

  • net.ipv4.tcp\_fin\_timeout
    It is the timeout period from Fin\_WAIT\_2 to TIME\_WAIT. If the peer FIN packet cannot be received for a long time, there is a high probability that there is a problem with the peer machine and close() cannot be called to close the connection in time. It is recommended to lower it to avoid waiting time Too long, too much resource overhead.
  • net.ipv4.tcp\_max\_tw\_buckets
    The maximum number of system TIME\_WAIT connections is adjusted according to actual business needs. When the maximum number is exceeded, dmesg will report an error TCP: time wait bucket table overflow.
  • net.ipv4.tcp\_tw\_reuse
    Allow the port occupied by the connection in the TIME\_WAIT state to be used in the new connection, and the client can open it.
  • net.ipv4.tcp\_tw\_recycle
    After opening, the connection in the TIME\_WAIT state can be used to create a new connection without waiting 2MSL time. In the NAT environment, turning on the tcp\_tw\_recycle parameter will trigger the PAWS mechanism and cause packet loss. It is recommended not to enable it. In fact, the kernel is in version 4.1 Then delete this parameter.

We are the Alibaba Cloud Intelligent Global Technical Service-SRE team. We are committed to becoming a technology-based, service-oriented, and high-availability engineer team of business systems; providing professional and systematic SRE services to help customers make better use of the cloud 、Build a more stable and reliable business system based on the cloud to improve business stability. We hope to share more technologies that help enterprise customers go to the cloud, make good use of the cloud, and make their business operations on the cloud more stable and reliable. You can scan the QR code below to join the Alibaba Cloud SRE Technical Institute Dingding circle, and more The multi-cloud master communicates about those things about the cloud platform.

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论