对tcp keepalive的疑惑,求解?

yuyi
  • 441

在服务端进行网络编程的过程中,开启keepalive后,使用tcpdump明显会看到一些length=0的心跳包,但是客户端断开很久以后。kqueue、kevent对应的事件并没有触发,进而无法监控对端的退出,然后影响到了后续的一系列操作!

求解!?
我的理解是keepalive只是保持了tcp连接的鲜活,并不一定会报告给上层异常,不知道是否正确!

心跳设定10秒检测一遍,从tcpdump查看执行正常
图片描述

最近一次测试的close

2017-10-27 15:22:56 add fd 5 events read 0 write 2
2017-10-27 15:22:56 mod fd 5 events read 1 write 0
2017-10-27 15:23:21 receive error fd 6 closed
2017-10-27 15:28:22 receive error fd 5 closed

最后一个连接断开耗时5min。第一个耗时25秒!

补充部分关键源码:

/**
 * 事件轮训
 * @param efd kqueue
 * @param server server节点
 * @param waitms 超时时间
 */
void loop(int efd, client_node* server, int waitms)
{
    struct timespec timeout;
    const int max_enents = 20;

    timeout.tv_sec  = waitms / 1000;
    timeout.tv_nsec = (waitms % 1000) * 1000 * 1000;

    struct kevent active_events[max_enents];
    //获取ready的fd,类似epoll_wait
    int n = kevent(efd, NULL, 0, active_events, max_enents, &timeout);
    printf("epoll_wait return %d\n", n);

    for (int i = 0; i < n; i ++) {
        //用户数据
        client_node* client = (client_node*)active_events[i].udata;
        //这里的data在不同的事件里面有不同的意义,accept里面代表等待连接的客户端
        //read事件里面代表可读的数据字节数
        long data      = active_events[i].data;
        uint16_t flags = active_events[i].flags;
        int events     = active_events[i].filter;
        
       /* printf("events = %d \n", events);
        printf("flags = %d \n", flags);
        printf("data = %ld \n", data);*/

        if (events == EVFILT_READ) {
            //如果触发的socket等于监听的socket,说明有新的连接
            if (client->fd == server->fd) {
                handle_accept(efd, server->fd, data);
            } else {
                handle_read(efd, client, data);
            }
        }
        
        else if (events == EVFILT_WRITE) {
            //keepalive(client, "", 0);
            handle_write(efd, client, data);
            /*if (flags & EV_EOF) {
                printf("ev eof close %d\n", client->fd);
                free_client(client);
                continue;
            }*/
        }
        
        else {
            printf("unknown event %d\r\n", events);
            goto error;
            break;
        }
        
        //处理出错的socket
        if (flags & EV_ERROR) {
            printf("ev error close %d\n", client->fd);
            free_client(client);
            //close((int)activeEvs[i].ident);
            continue;
        }
    }
    
    return;
    
error:
    printf("server error close %d\n", server->fd);
    free_client(server);
    exit_if(1, "unknown event");
}


int main()
{
    //queue_test();
    //return 0;
    
    //time_test();
    //return 0;
    
    short port  = 9998;
    int epollfd = kqueue();
    exit_if(epollfd < 0, "epoll_create failed");
    
    int listenfd = socket(AF_INET, SOCK_STREAM, 0);
    
    //struct linger linger;
   // memset(&linger, 0, sizeof(struct linger));
   // setsockopt(listenfd, SOL_SOCKET, SO_LINGER, (const void*)&linger, sizeof(struct linger));
    
    exit_if(listenfd < 0, "socket failed");
    
    struct sockaddr_in addr;
    memset(&addr, 0, sizeof addr);
    
    addr.sin_family      = AF_INET;
    addr.sin_port        = htons(port);
    addr.sin_addr.s_addr = INADDR_ANY;
    
    int r = ::bind(listenfd,(struct sockaddr *)&addr, sizeof(struct sockaddr));
    exit_if(r, "bind to 0.0.0.0:%d failed %d %s", port, errno, strerror(errno));
    
    r = listen(listenfd, 20);
    exit_if(r, "listen failed %d %s", errno, strerror(errno));
    
    printf("fd %d listening at %d\n", listenfd, port);
    
    client_node* server = init_client(listenfd, (char*)"0.0.0.0", port);
    
    //屏蔽sigpipe
    const int value = 1;
    setsockopt(epollfd, SOL_SOCKET, SO_NOSIGPIPE, &value, sizeof(int));
    setsockopt(epollfd, SOL_SOCKET, SO_REUSEADDR, &value, sizeof(int));
    int on = 1, secs = 10, count = 3, interval = 5;
    setsockopt(epollfd, SOL_SOCKET,  SO_KEEPALIVE, &on, sizeof(int));
    setsockopt(epollfd, IPPROTO_TCP, TCP_KEEPALIVE, &secs, sizeof(int));

    
    set_non_block(listenfd);
    update_events(epollfd, server, kReadEvent, false);
    
    /*struct sigaction act;
    act.sa_handler = SIG_IGN;
    sigaction(SIGPIPE, &act, NULL);*/
    signal(SIGPIPE, SIG_IGN);
    
    
    for (;;) { //实际应用应当注册信号处理函数,退出时清理资源
        loop(epollfd, server, 1000);
    }
    
    return 0;
}

回复
阅读 3.1k
1 个回答

tcp的Keeplive是实现在服务器侧,客户端被动响应,缺省超时时间为120分钟,这是RFC协议标准规范。
https://tools.ietf.org/html/r...

需要指出的是,超时时间是指TCP连接没有任何数据、控制字传输的时间,如果有任何数据传输,会刷新定时器,重新走表。

就是说120分钟后,keeplive机制才会感受到。

TCP心跳是一个备受争议的实现,只是一个option,不是强制标准。

总得来说就是因为超时时间较长,无法给应用层提供快速的反馈,所以应用层需要独立实现自己的心跳。

你知道吗?

宣传栏