0

背景是这样的,我去模仿Swoole的代码写了一个协程库,上下文切换用的是boost.asm的代码。但是在上下文切换的时候,也就是在boost.asm里面的jump_fcontext里面出了问题,报错是段错误:

图片描述

打日志发现应该是在resume某个协程的时候出了问题。而且我看报错的那一行汇编代码用到了sp寄存器,所以我猜测是我在某个地方不小心释放掉了协程栈。但是这个问题我无从下手,所以想请教一下前辈们我改如何去解决这个问题,有什么好的工具或者思路吗?
如果前辈们需要用到源码分析一下,代码在:https://github.com/huanghanta...
我发现这个问题是我写的测试文件https://github.com/huanghanta...
在45行的位置,我加了一行co::sleep,切换出了上下文。
然后我用ab 去压测这个服务器:

ab -c 100 -n 10000 127.0.0.1:80/

大概压测10 20 次才会触发这个段错误。
(如果不加上co::sleep就不会有这个段错误,而且在1000个连接,100W请求都是没有问题的,所以应该是可以排除上下文切换的问题)
我测试了一下,代码是没有明显的内存泄漏的。压测完稳定的时候一直是处于4.6M左右。

使用valgrind检查内存情况得到如下结果:

~/codeDir/cppCode/fsw/examples # valgrind --track-origins=yes ./a.out 
==85446== Memcheck, a memory error detector
==85446== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==85446== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==85446== Command: ./a.out
==85446== 
==85446== Warning: client switching stacks?  SP change: 0x1fff0008e0 --> 0x4abb708
==85446==          to suppress, use: --max-stackframe=137343816152 or greater
==85446== Warning: client switching stacks?  SP change: 0x4abb590 --> 0x1fff0008e0
==85446==          to suppress, use: --max-stackframe=137343816528 or greater
==85446== Warning: client switching stacks?  SP change: 0x1fff0008c0 --> 0x4abb590
==85446==          to suppress, use: --max-stackframe=137343816496 or greater
==85446==          further instances of this message will not be shown.
==85446== Invalid write of size 8
==85446==    at 0x4E31A66: uv_timer_init (timer.c:64)
==85446==    by 0x4CB406D: fsw::Coroutine::sleep(double) (coroutine.cc:82)
==85446==    by 0x10933F: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446==    by 0x109517: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446==    by 0x4CB3DE4: fsw::Context::context_func(void*) (context.cc:58)
==85446==    by 0x4CB6830: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==85446==  Address 0x50540a0 is 32 bytes inside a block of size 152 free'd
==85446==    at 0x489DBDF: operator delete(void*) (vg_replace_malloc.c:576)
==85446==    by 0x4CB40DD: fsw::Coroutine::sleep(double) (coroutine.cc:86)
==85446==    by 0x10933F: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446==    by 0x109517: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446==    by 0x4CB3DE4: fsw::Context::context_func(void*) (context.cc:58)
==85446==    by 0x4CB6830: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==85446==  Block was alloc'd at
==85446==    at 0x489CCF5: operator new(unsigned long) (vg_replace_malloc.c:334)
==85446==    by 0x4CB402E: fsw::Coroutine::sleep(double) (coroutine.cc:74)
==85446==    by 0x10933F: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446==    by 0x109517: main::{lambda(void*)#1}::operator()(void*) const::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==85446==    by 0x4CB3DE4: fsw::Context::context_func(void*) (context.cc:58)
==85446==    by 0x4CB6830: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==85446==

但是,我通过这些信息还是不知道如何解决。看样子是定时器那一块出错了。问题代码应该是:

static void sleep_timeout(uv_timer_t *timer)
{
    fswTrace("coroutine[%ld] sleep timeout", ((Coroutine *) timer->data)->get_cid());
    ((Coroutine *) timer->data)->resume();
}

int Coroutine::sleep(double seconds)
{
    uv_timer_t *timer;
    Coroutine *co = Coroutine::get_current();
    fswTrace("coroutine[%ld] sleep", co->cid);

    try
    {
        timer = new uv_timer_t();
    }
    catch(const std::bad_alloc& e)
    {
        fswError("%s", e.what());
    }
    
    timer->data = co;
    uv_timer_init(uv_default_loop(), timer);
    uv_timer_start(timer, sleep_timeout, seconds * 1000, 0);
   
    co->yield();
    delete timer;
    timer = nullptr;
    return 0;
}

刚才我编写了一个新的测试代码:

#include <iostream>
#include "fsw/coroutine.h"
#include "fsw/fsw.h"

using namespace fsw;
using namespace std;

int main(int argc, char const *argv[])
{
    fsw_event_init();

    while (true)
    {
        Coroutine::create([](void *arg)
        {
            Coroutine *co = Coroutine::get_current();
            int  cid = co->get_cid();
            cout << cid << endl;
            Coroutine::sleep(0.5);
            cout << cid << endl;
        });

        fsw_event_wait();
    }
    
    return 0;
}

也是报了写入非法内存的错误:

~/codeDir/cppCode/fsw/examples # valgrind --track-origins=yes ./a.out 
==87209== Memcheck, a memory error detector
==87209== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==87209== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==87209== Command: ./a.out
==87209== 
==87209== Warning: client switching stacks?  SP change: 0x1fff0008e0 --> 0x4abb708
==87209==          to suppress, use: --max-stackframe=137343816152 or greater
1
[2019-09-10 07:40:24]    TRACE    sleep: coroutine[1] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:40:24]    TRACE    sleep: coroutine[1] new timer[0x4abb850] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:40:24]    TRACE    yield: coroutine[1] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
==87209== Warning: client switching stacks?  SP change: 0x4abb600 --> 0x1fff0008e0
==87209==          to suppress, use: --max-stackframe=137343816416 or greater
[2019-09-10 07:40:24]    TRACE    sleep_timeout: coroutine[1] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:40:24]    TRACE    resume: coroutine[1] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
==87209== Warning: client switching stacks?  SP change: 0x1fff000880 --> 0x4abb600
==87209==          to suppress, use: --max-stackframe=137343816320 or greater
==87209==          further instances of this message will not be shown.
1
[2019-09-10 07:40:24]    TRACE    resume: coroutine[1] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
2
[2019-09-10 07:40:24]    TRACE    sleep: coroutine[2] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:40:24]    TRACE    sleep: coroutine[2] new timer[0x5054080] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
==87209== Invalid write of size 8
==87209==    at 0x4E1DA66: uv_timer_init (timer.c:64)
==87209==    by 0x4CB4330: fsw::Coroutine::sleep(double) (coroutine.cc:83)
==87209==    by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209==    by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209==    by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87209==    by 0x4CB6C00: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87209==  Address 0x4abb870 is 32 bytes inside a block of size 152 free'd
==87209==    at 0x489DBDF: operator delete(void*) (vg_replace_malloc.c:576)
==87209==    by 0x4CB43A0: fsw::Coroutine::sleep(double) (coroutine.cc:87)
==87209==    by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209==    by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209==    by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87209==    by 0x4CB6C00: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87209==  Block was alloc'd at
==87209==    at 0x489CCF5: operator new(unsigned long) (vg_replace_malloc.c:334)
==87209==    by 0x4CB428F: fsw::Coroutine::sleep(double) (coroutine.cc:74)
==87209==    by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209==    by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87209==    by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87209==    by 0x4CB6C00: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87209==

看样子是协程使用了被deletetimer。但是我打印日志,确定只有当协程sleep timeout之后,才会去释放timer

~/codeDir/cppCode/fsw/examples # valgrind --track-origins=yes ./a.out
==87271== Memcheck, a memory error detector
==87271== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==87271== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==87271== Command: ./a.out
==87271== 
==87271== Warning: client switching stacks?  SP change: 0x1fff0008e0 --> 0x4abb708
==87271==          to suppress, use: --max-stackframe=137343816152 or greater
1
[2019-09-10 07:50:29]    TRACE    sleep: coroutine[1] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:29]    TRACE    sleep: coroutine[1] new timer[0x4abb850] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:50:29]    TRACE    yield: coroutine[1] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
==87271== Warning: client switching stacks?  SP change: 0x4abb600 --> 0x1fff0008e0
==87271==          to suppress, use: --max-stackframe=137343816416 or greater
[2019-09-10 07:50:29]    TRACE    sleep_timeout: coroutine[1] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:50:29]    TRACE    resume: coroutine[1] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
==87271== Warning: client switching stacks?  SP change: 0x1fff000880 --> 0x4abb600
==87271==          to suppress, use: --max-stackframe=137343816320 or greater
==87271==          further instances of this message will not be shown.
[2019-09-10 07:50:30]    TRACE    sleep: coroutine[1] free timer[0x4abb850] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 87.
1
[2019-09-10 07:50:30]    TRACE    resume: coroutine[1] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
2
[2019-09-10 07:50:30]    TRACE    sleep: coroutine[2] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:30]    TRACE    sleep: coroutine[2] new timer[0x5054080] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
==87271== Invalid write of size 8
==87271==    at 0x4E1DA66: uv_timer_init (timer.c:64)
==87271==    by 0x4CB4330: fsw::Coroutine::sleep(double) (coroutine.cc:83)
==87271==    by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==    by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==    by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87271==    by 0x4CB6C60: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87271==  Address 0x4abb870 is 32 bytes inside a block of size 152 free'd
==87271==    at 0x489DBDF: operator delete(void*) (vg_replace_malloc.c:576)
==87271==    by 0x4CB4402: fsw::Coroutine::sleep(double) (coroutine.cc:88)
==87271==    by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==    by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==    by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87271==    by 0x4CB6C60: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87271==  Block was alloc'd at
==87271==    at 0x489CCF5: operator new(unsigned long) (vg_replace_malloc.c:334)
==87271==    by 0x4CB428F: fsw::Coroutine::sleep(double) (coroutine.cc:74)
==87271==    by 0x109274: main::{lambda(void*)#1}::operator()(void*) const (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==    by 0x1092BA: main::{lambda(void*)#1}::_FUN(void*) (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271==    by 0x4CB3E88: fsw::Context::context_func(void*) (context.cc:49)
==87271==    by 0x4CB6C60: make_fcontext (make_x86_64_sysv_elf_gas.S:64)
==87271== 
[2019-09-10 07:50:30]    TRACE    yield: coroutine[2] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
[2019-09-10 07:50:30]    TRACE    sleep_timeout: coroutine[2] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:50:30]    TRACE    resume: coroutine[2] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
[2019-09-10 07:50:30]    TRACE    sleep: coroutine[2] free timer[0x5054080] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 87.
2
[2019-09-10 07:50:30]    TRACE    resume: coroutine[2] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
3
[2019-09-10 07:50:30]    TRACE    sleep: coroutine[3] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:30]    TRACE    sleep: coroutine[3] new timer[0x5054260] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:50:30]    TRACE    yield: coroutine[3] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
[2019-09-10 07:50:31]    TRACE    sleep_timeout: coroutine[3] sleep timeout in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 62.
[2019-09-10 07:50:31]    TRACE    resume: coroutine[3] resume in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 47.
[2019-09-10 07:50:31]    TRACE    sleep: coroutine[3] free timer[0x5054260] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 87.
3
[2019-09-10 07:50:31]    TRACE    resume: coroutine[3] end in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 53.
4
[2019-09-10 07:50:31]    TRACE    sleep: coroutine[4] sleep in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 70.
[2019-09-10 07:50:31]    TRACE    sleep: coroutine[4] new timer[0x5054440] in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 75.
[2019-09-10 07:50:31]    TRACE    yield: coroutine[4] yield in /root/codeDir/cppCode/fsw/src/coroutine/coroutine.cc on line 39.
^C==87271== 
==87271== Process terminating with default action of signal 2 (SIGINT)
==87271==    at 0x40213D0: epoll_pwait (in /lib/ld-musl-x86_64.so.1)
==87271==    by 0x10930F: main (in /root/codeDir/cppCode/fsw/examples/a.out)
==87271== 
==87271== HEAP SUMMARY:
==87271==     in use at exit: 2,172,436 bytes in 15 blocks
==87271==   total heap usage: 33 allocs, 18 frees, 8,465,308 bytes allocated
==87271== 
==87271== LEAK SUMMARY:
==87271==    definitely lost: 0 bytes in 0 blocks
==87271==    indirectly lost: 0 bytes in 0 blocks
==87271==      possibly lost: 0 bytes in 0 blocks
==87271==    still reachable: 2,172,436 bytes in 15 blocks
==87271==         suppressed: 0 bytes in 0 blocks
==87271== Rerun with --leak-check=full to see details of leaked memory
==87271== 
==87271== For counts of detected and suppressed errors, rerun with: -v
==87271== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)

~/codeDir/cppCode/fsw/examples # 
保密 103
2019-09-10 提问
1 个回答
0

已采纳

通过调试,我发现是我使用libuvtimer的方式又问题,释放timer的时候不对。目前正在寻找正确释放timer的方式。

撰写答案

推广链接