上周在进行压测时,某个调用hiredis库的函数出现了coredump,调用栈如下:
Program terminated with signal 11, Segmentation fault.
#0 0x000000000052c497 in wh::common::redis::RedisConn::HashMultiGet(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) ()
(gdb) bt
#0 0x000000000052c497 in wh::common::redis::RedisConn::HashMultiGet(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) ()
#1 0x00000000004cc418 in wh::server::user_activityHandler::getUserRating(wh::server::GetUserRatingResult&, std::vector<int, std::allocator<int> > const&) ()
#2 0x00000000004e54cf in wh::server::user_activityProcessor::process_getUserRating(int, apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, void*) ()
#3 0x00000000004e3ad3 in wh::server::user_activityProcessor::dispatchCall(apache::thrift::protocol::TProtocol*, apache::thrift::protocol::TProtocol*, std::string const&, int, void*) ()
RedisConn
中的HashMultiGet
代码如下:
int RedisConn::HashMultiGet(
const string& key,
const vector<string>& fields,
map<string, string>& fvs)
{
if(key.empty() || fields.empty())
return 0;
if ( !conn_ )
{
LOG(LOG_ERR, "ERROR!!! conn is NULL!!!");
return kErrConnBroken;
}
size_t argc = fields.size() + 2;
const char* argv[argc]; //在栈中直接分配内存
size_t argvlen[argc];
std::string cmd = "HMGET";
argv[0] = cmd.data();
argvlen[0] = cmd.length();
argv[1] = key.data();
argvlen[1] = key.length();
size_t i = 2;
for(vector< string >::const_iterator cit = fields.begin();
cit != fields.end(); ++cit )
{
// put value into arg list
argv[i] = cit->data();
argvlen[i] = cit->length();
++i;
}
redisReply* reply = static_cast<redisReply*>( redisCommandArgv( conn_, argc, argv, argvlen ) );
if ( !reply )
{
this->Release();
LOG(LOG_ERR, "ERROR!!! Redis connection broken!!!");
return kErrConnBroken;
}
int32_t ret = kErrOk;
if ( reply->type != REDIS_REPLY_ARRAY )
{
this->CheckReply( reply );
LOG(LOG_ERR, "RedisReply ERROR: %d %s", reply->type, reply->str);
ret = kErrUnknown;
}
...
其中出现问题的地方是构造hiredis
的redisCommandArgv
请求时,构造的两个参数都是直接在栈上分配。
const char* argv[argc]; //在栈中直接分配内存
size_t argvlen[argc];
压测时,HashMultiGet(key, fields, fvs)
中fields
大小超过10万,那么在栈上分配的内存为 10万 * (8 + 8) = 160万字节 = 1.6MB (64位系统),再加上之前分配的栈,将栈打爆了,导致了coredump.
为什么要将参数在栈上分配呢?一种可能是:如果在堆上分配,就需要考虑free的问题。
解决方法:
将argv和argvlen在堆上分配,毕竟堆的大小比栈大很多。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。