动机

在用elixir 写 rpc server/client时, 需要对传入gen_tcp的参数做一些考量. 如, 部分参数应该允许用户修改, 比如sndbuf recbuf, 让用户根据使用场景调节, 部分参数应该屏蔽, 减少使用理解成本.
故, 深挖了一下gen_tcp的option

代码版本

文章中贴的文件和行号来源于如下代码版本

  • erlang: OTP-21.0.9

options

Available options for tcp:connect

inet.erl:723

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Available options for tcp:connect
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
connect_options() ->
    [tos, tclass, priority, reuseaddr, keepalive, linger, sndbuf, recbuf, nodelay,
     header, active, packet, packet_size, buffer, mode, deliver, line_delimiter,
     exit_on_close, high_watermark, low_watermark, high_msgq_watermark,
     low_msgq_watermark, send_timeout, send_timeout_close, delay_send, raw,
     show_econnreset, bind_to_device].

tos

type of service
下图来自tcp ip详解 卷1
clipboard.png

tclass

IPV6_TCLASS
{tclass, Integer}
Sets IPV6_TCLASS IP level options on platforms where this is implemented.
The behavior and allowed range varies between different systems.
The option is ignored on platforms where it is not implemented. Use with caution.
不知道具体含义, 忽略

priority

   SO_PRIORITY
          Set the protocol-defined priority for all packets to be sent
          on this socket.  Linux uses this value to order the networking
          queues: packets with a higher priority may be processed first
          depending on the selected device queueing discipline.  Setting
          a priority outside the range 0 to 6 requires the CAP_NET_ADMIN
          capability.

reuseaddr

   SO_REUSEPORT (since Linux 3.9)
          Permits multiple AF_INET or AF_INET6 sockets to be bound to an
          identical socket address.  This option must be set on each
          socket (including the first socket) prior to calling bind(2)
          on the socket.  To prevent port hijacking, all of the pro‐
          cesses binding to the same address must have the same effec‐
          tive UID.  This option can be employed with both TCP and UDP
          sockets.

          For TCP sockets, this option allows accept(2) load distribu‐
          tion in a multi-threaded server to be improved by using a dis‐
          tinct listener socket for each thread.  This provides improved
          load distribution as compared to traditional techniques such
          using a single accept(2)ing thread that distributes connec‐
          tions, or having multiple threads that compete to accept(2)
          from the same socket.

          For UDP sockets, the use of this option can provide better
          distribution of incoming datagrams to multiple processes (or
          threads) as compared to the traditional technique of having
          multiple processes compete to receive datagrams on the same
          socket.

keepalive

   SO_KEEPALIVE
          Enable sending of keep-alive messages on connection-oriented
          sockets.  Expects an integer boolean flag.
keepalive的可选参数和含义
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_time
1800
the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any furthe
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
root@1ba6f31f7bc3:/# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
the number of unacknowledged probes to send before considering the connection dead and notifying the application layer
主要问题:
  1. 没有穿透负载均衡器.
  2. 检测得太慢.
  3. 不知道应用状态.

linger

   SO_LINGER
          Sets or gets the SO_LINGER option.  The argument is a linger
          structure.

              struct linger {
                  int l_onoff;    /* linger active */
                  int l_linger;   /* how many seconds to linger for */
              };

          When enabled, a close(2) or shutdown(2) will not return until
          all queued messages for the socket have been successfully sent
          or the linger timeout has been reached.  Otherwise, the call
          returns immediately and the closing is done in the background.
          When the socket is closed as part of exit(2), it always
          lingers in the background.

close/shutdown前是否等待所有包都送达.

sndbuf recbuf buffer

   SO_SNDBUF
          Sets or gets the maximum socket send buffer in bytes.  The
          kernel doubles this value (to allow space for bookkeeping
          overhead) when it is set using setsockopt(2), and this doubled
          value is returned by getsockopt(2).  The default value is set
          by the /proc/sys/net/core/wmem_default file and the maximum
          allowed value is set by the /proc/sys/net/core/wmem_max file.
          The minimum (doubled) value for this option is 2048.
   SO_RCVBUF
          Sets or gets the maximum socket receive buffer in bytes.  The
          kernel doubles this value (to allow space for bookkeeping
          overhead) when it is set using setsockopt(2), and this doubled
          value is returned by getsockopt(2).  The default value is set
          by the /proc/sys/net/core/rmem_default file, and the maximum
          allowed value is set by the /proc/sys/net/core/rmem_max file.
          The minimum (doubled) value for this option is 256.

inet_drv.c:6708

case INET_OPT_SNDBUF:
    {
        arg.ival= get_int32 (curr);      curr += 4;
        proto   = SOL_SOCKET;
        type    = SO_SNDBUF;
        arg_ptr = (char*) (&arg.ival);
        arg_sz  = sizeof  ( arg.ival);

        /* Adjust the size of the user-level recv buffer, so it's not
           smaller than the kernel one: */
        if (desc->bufsz <= arg.ival)
        desc->bufsz  = arg.ival;
        break;
    }

可以看到, buffer是用户的缓存, 一定不小于内核buffer, 然而获得的buffer小于 recbuf, sdnbuf.
怀疑: 设置了recvbuf, sndbuf才会改变buffer.

nodelay

TCP_NODELAY

        DISCUSSION:
             The Nagle algorithm is generally as follows:

                  If there is unacknowledged data (i.e., SND.NXT >
                  SND.UNA), then the sending TCP buffers all user
                  data (regardless of the PSH bit), until the
                  outstanding data has been acknowledged or until
                  the TCP can send a full-sized segment (Eff.snd.MSS
                  bytes; see Section 4.2.2.6).

             Some applications (e.g., real-time display window
             updates) require that the Nagle algorithm be turned
             off, so small data segments can be streamed out at the
             maximum rate.

可以看到和延迟确认一起使用时会带来很大的延时.

header

http://erlang.org/doc/man/ine...
定长header, 处理定长header时可以一用.

active

用被动模式, 异步收发.

packet, raw

包头长度. 即用多少字节表示包长. raw 等同于 {packet, 0}

packet_size

包最大长度. 最大允许的包长.

mode

{mode, Mode :: binary | list}
Received Packet is delivered as defined by Mode.

deliver

{deliver, port | term}
When {active, true}, data is delivered on the form port : {S, {data, [H1,..Hsz | Data]}} or term : {tcp, S, [H1..Hsz | Data]}.

line_delimiter

{line_delimiter, Char}(TCP/IP sockets)
Sets the line delimiting character for line-oriented protocols (line). Defaults to $n.

exit_on_close

{exit_on_close, Boolean}
This option is set to true by default.
The only reason to set it to false is if you want to continue sending data to the socket after a close is detected, for example, if the peer uses gen_tcp:shutdown/2 to shut down the write side.

high_watermark, low_watermark, high_msgq_watermark,

 low_msgq_watermark

影响socket busy state的切换.
需要搞清楚几个问题:
socket busy state是什么, 譬如调用发送/接收有什么返回?
msgq data size 和 socket data size, socket data size 是否就是buffer?

send_timeout

发送超时时间, 默认无限等待

send_timeout_close

发送超时是否自动关闭.

delay_send

应用层并包. 默认关闭. 可以考虑开启.

show_econnreset

是否把RST当正常关闭.

bind_to_device

使用指定的设备(网卡)

参考资料

  1. http://erlang.org/doc/man/gen...
  2. http://man7.org/linux/man-pag...
  3. http://erlang.org/doc/man/ine...
  4. https://github.com/erlang/otp
  5. https://tools.ietf.org/html/r...
  6. https://www.ietf.org/rfc/rfc3...
  7. https://tools.ietf.org/html/r...

enjolras1205
77 声望9 粉丝

引用和评论

0 条评论