前言
对服务端程序来说, 内存泄漏是经常会面临的问题. 使用erlang的情况下, 不用程序员手动管理内存. 如果不写c driver, 一般的内存问题还是很容易定位的. 这篇blog对常见的内存泄漏类型, 排查手段做个小结.
observer_cli
erlang vm的top工具。
可以快速按cpu/mem/message 对进程排序。
还有网络,system memory, ets 大盘。
https://github.com/zhongwencool/observer_cli
内存碎片
https://ferd.github.io/recon/recon_alloc.html#fragmentation/1
内存泄漏类型
process泄漏
如果没有etop
iex(xxxx@xxxx.)1> :erlang.system_info(:process_count)
5369
可以通过process_count来获取erlang vm中已分配的process数量. 若process数量和业务实际需要不吻合, 则需要排查.
消息堆积
iex(xxxx@xxxx.)5> spawn fn -> :etop.start([sort: :msg_q]) end
#PID<0.6255.1>
========================================================================================
'xxxx@xxxx.' 03:04:46
Load: cpu 0 Memory: total 147234 binary 2839
procs 5371 processes 59008 code 42641
runq 0 atom 1722 ets 8239
Pid Name or Initial Func Time Reds Memory MsgQ Current Function
----------------------------------------------------------------------------------------
<7796.0.0> init '-' 339058 29540 0 init:loop/1
<7796.1.0> erts_code_purger '-' 479850 285160 0 erts_code_purger:wai
<7796.2.0> erts_literal_area_co '-' 337591 2688 0 erts_literal_area_co
<7796.3.0> erts_dirty_process_s '-' 37924 2688 0 erts_dirty_process_s
一般消息堆积都会伴随着memory增长, 不管是sort by msg_q 或 memory, 都很容易发现问题.
如果没有etop
iex(xxxx@xxxx.)8> Enum.map(:erlang.processes(), fn proc -> {:erlang.process_info(proc, :message_queue_len), proc} end) |> Enum.sort(fn({{_, a}, _}, {{_, b}, _}) -> a > b end) |> List.first
{{:message_queue_len, 0}, #PID<0.32638.0>}
ets表泄漏
找出占用最多内存的ets表
iex(7)> :ets.all() |> Enum.map(fn ets_name -> {:ets.info(ets_name, :memory), ets_name} end) |> Enum.sort(fn a, b -> a > b end)
[
{18002942, :test},
{41940, #Reference<0.3983585142.1897791489.87703>},
...
]
整体内存分析
:erlang.memory 可以一眼看出是否ets表存在泄漏
值得注意的是, 大于64bit的binary, 会在:erlang.memory的binary项体现. 不会计入ets项中.
65bit
iex(1)> :ets.new(:test, [:public, :named_table])
:test
iex(2)> :erlang.memory
[
total: 23688632,
processes: 4940400,
processes_used: 4939456,
system: 18748232,
atom: 463465,
atom_used: 442288,
binary: 27872,
code: 8462310,
ets: 589664
]
iex(3)> for num <- 1..1000000 do
...(3)> :ets.insert(:test, {num, :crypto.strong_rand_bytes(65)})
...(3)> end
[true, true, true, true, true, true, true, true, true, true, true, true, true,
true, true, true, true, true, true, true, true, true, true, true, true, true,
true, true, true, true, true, true, true, true, true, true, true, true, true,
true, true, true, true, true, true, true, true, true, true, true, ...]
iex(4)> :erlang.memory
[
total: 284511736,
processes: 33626760,
processes_used: 33625816,
system: 250884976,
atom: 463465,
atom_used: 446381,
binary: 112090520,
code: 8553627,
ets: 120619384
]
64bit
iex(1)> :ets.new(:test, [:public, :named_table])
:test
iex(2)> :erlang.memory
[
total: 23569856,
processes: 4778728,
processes_used: 4777784,
system: 18791128,
atom: 463465,
atom_used: 442288,
binary: 70736,
code: 8462310,
ets: 589680
]
iex(3)> for num <- 1..1000000 do
...(3)> :ets.insert(:test, {num, :crypto.strong_rand_bytes(64)})
...(3)> end
[true, true, true, true, true, true, true, true, true, true, true, true, true,
true, true, true, true, true, true, true, true, true, true, true, true, true,
true, true, true, true, true, true, true, true, true, true, true, true, true,
true, true, true, true, true, true, true, true, true, true, true, ...]
iex(4)> :erlang.memory
[
total: 204325192,
processes: 33373520,
processes_used: 33372576,
system: 170951672,
atom: 463465,
atom_used: 447944,
binary: 39168,
code: 8653586,
ets: 152623976
]
数据过大
首先, 应该能估算出业务大致的内存占用. 可以通过process_info, 找出可疑的进程.
iex(11)> :erlang.process_info(:ets.info(:test, :owner), :memory)
{:memory, 28693220}
通过:sys.get_state可以发现一些逻辑错误造成的, list/map无限增长的bug.
iex(xxxxx@xxxxx.)13> :sys.get_state(:erlang.list_to_pid('<0.2362.0>'))
{:state, {:local, :prometheus_sup}, :one_for_one, {[], %{}}, :undefined, 5, 1,
[], 0, :prometheus_sup, []}
将进程按内存占用排序
:recon.proc_count(:memory, 10)
如果没有打包recon
iex(xxxx@xxxx.)2> Enum.map(:erlang.processes(), fn proc -> {:erlang.process_info(proc, :memory), proc} end) |> Enum.sort(fn({{_, a}, _}, {{_, b}, _}) -> a > b end) |> Enum.take(100)
[ {{:memory, 140005528}, #PID<0.4831.0>}, {{:memory, 34070956}, #PID<0.25119.8>}, {{:memory, 34051004}, #PID<0.25113.8>}, {{:memory, 33999180}, #PID<0.25100.8>}, {{:memory, 33958124}, #PID<0.25104.8>},
monitor links 泄漏排查
erlang monitor后, 源进程和被monitor进程都会记录数据, 下面的脚本可以快速定位到问题. 这篇内存会被归为system, 由std_alloc分配器分配.
Enum.map(:erlang.processes(), fn proc -> {:erlang.process_info(proc, :monitors), proc} end) |> Enum.filter(fn v -> elem(v, 0) != :undefined end) |> Enum.map(fn v -> {length(elem(elem(v, 0), 1)), elem(v, 1)} end) |> Enum.sort(fn({a, _}, {b, _}) -> a > b end) |> Enum.take(100)
Enum.map(:erlang.processes(), fn proc -> {:erlang.process_info(proc, :monitored_by), proc} end) |> Enum.filter(fn v -> elem(v, 0) != :undefined end) |> Enum.map(fn v -> {length(elem(elem(v, 0), 1)), elem(v, 1)} end) |> Enum.sort(fn({a, _}, {b, _}) -> a > b end) |> Enum.take(100)
查看各个分配器的内存占用
:recon_alloc.memory(:allocated_types)
内存估算
https://github.com/okeuday/erlang\_term
http://erlang.org/doc/efficiency\_guide/advanced.html#id68923
一些源码的阅读记录
elixir数据类型
- integer
- float
- boolean
- atom
- string
- list
tuple
Map MapSet, func, nil, ets?#if ET_DEBUG ERTS_GLB_INLINE unsigned tag_val_def(Wterm x, const char *file, unsigned line) #else ERTS_GLB_INLINE unsigned tag_val_def(Wterm x) #define file __FILE__ #define line __LINE__ #endif { static char *msg = "tag_val_def error"; switch (x & _TAG_PRIMARY_MASK) { case TAG_PRIMARY_LIST: ET_ASSERT(_list_precond(x),file,line); return LIST_DEF; case TAG_PRIMARY_BOXED: { Eterm hdr = *boxed_val(x); ET_ASSERT(is_header(hdr),file,line); switch ((hdr & _TAG_HEADER_MASK) >> _TAG_PRIMARY_SIZE) { case (_TAG_HEADER_ARITYVAL >> _TAG_PRIMARY_SIZE): return TUPLE_DEF; case (_TAG_HEADER_POS_BIG >> _TAG_PRIMARY_SIZE): return BIG_DEF; case (_TAG_HEADER_NEG_BIG >> _TAG_PRIMARY_SIZE): return BIG_DEF; case (_TAG_HEADER_REF >> _TAG_PRIMARY_SIZE): return REF_DEF; case (_TAG_HEADER_FLOAT >> _TAG_PRIMARY_SIZE): return FLOAT_DEF; case (_TAG_HEADER_EXPORT >> _TAG_PRIMARY_SIZE): return EXPORT_DEF; case (_TAG_HEADER_FUN >> _TAG_PRIMARY_SIZE): return FUN_DEF; case (_TAG_HEADER_EXTERNAL_PID >> _TAG_PRIMARY_SIZE): return EXTERNAL_PID_DEF; case (_TAG_HEADER_EXTERNAL_PORT >> _TAG_PRIMARY_SIZE): return EXTERNAL_PORT_DEF; case (_TAG_HEADER_EXTERNAL_REF >> _TAG_PRIMARY_SIZE): return EXTERNAL_REF_DEF; case (_TAG_HEADER_MAP >> _TAG_PRIMARY_SIZE): return MAP_DEF; case (_TAG_HEADER_REFC_BIN >> _TAG_PRIMARY_SIZE): return BINARY_DEF; case (_TAG_HEADER_HEAP_BIN >> _TAG_PRIMARY_SIZE): return BINARY_DEF; case (_TAG_HEADER_SUB_BIN >> _TAG_PRIMARY_SIZE): return BINARY_DEF; case (_TAG_HEADER_BIN_MATCHSTATE >> _TAG_PRIMARY_SIZE): return MATCHSTATE_DEF; } break; } case TAG_PRIMARY_IMMED1: { switch ((x & _TAG_IMMED1_MASK) >> _TAG_PRIMARY_SIZE) { case (_TAG_IMMED1_PID >> _TAG_PRIMARY_SIZE): return PID_DEF; case (_TAG_IMMED1_PORT >> _TAG_PRIMARY_SIZE): return PORT_DEF; case (_TAG_IMMED1_IMMED2 >> _TAG_PRIMARY_SIZE): { switch ((x & _TAG_IMMED2_MASK) >> _TAG_IMMED1_SIZE) { case (_TAG_IMMED2_ATOM >> _TAG_IMMED1_SIZE): return ATOM_DEF; case (_TAG_IMMED2_NIL >> _TAG_IMMED1_SIZE): return NIL_DEF; } break; } case (_TAG_IMMED1_SMALL >> _TAG_PRIMARY_SIZE): return SMALL_DEF; } break; } } erl_assert_error(msg, __FUNCTION__, file, line); #undef file #undef line } #endif
integer
small integer
可以看到, erlang区分了大小整数, 小整数根据64/32系统不同, 使用了 N-4 bit字节. 最低位为0xF, 即0b1111
#define is_integer(x) (is_small(x) || is_big(x)) /* fixnum ("small") access methods */ #if defined(ARCH_64) #define SMALL_BITS (64-4) #define SMALL_DIGITS (17) #else #define SMALL_BITS (28) #define SMALL_DIGITS (8) #endif #define MAX_SMALL ((SWORD_CONSTANT(1) << (SMALL_BITS-1))-1) #define MIN_SMALL (-(SWORD_CONSTANT(1) << (SMALL_BITS-1))) #define _TAG_IMMED1_SMALL ((0x3 << _TAG_PRIMARY_SIZE) | TAG_PRIMARY_IMMED1) #define make_small(x) (((Uint)(x) << _TAG_IMMED1_SIZE) + _TAG_IMMED1_SMALL) #define is_small(x) (((x) & _TAG_IMMED1_MASK) == _TAG_IMMED1_SMALL)
尤其是make_small宏.
#define make_small(x) (((Uint)(x) << _TAG_IMMED1_SIZE) + _TAG_IMMED1_SMALL)
故, 小整形占用64/32 bit空间.
big integer
最低位是否为0, boxed?
#define make_big(x) make_boxed((x)) #define make_boxed(x) _ET_APPLY(make_boxed,(x)) #define TAG_PRIMARY_BOXED 0x2 #define _unchecked_make_boxed(x) ((Uint)(x) + TAG_PRIMARY_BOXED) #define _TAG_PRIMARY_MASK 0x3 #define _is_not_boxed(x) ((x) & (_TAG_PRIMARY_MASK-TAG_PRIMARY_BOXED))
atom
#define make_atom(x) ((Eterm)(((x) << _TAG_IMMED2_SIZE) + _TAG_IMMED2_ATOM)) #define is_atom(x) (((x) & _TAG_IMMED2_MASK) == _TAG_IMMED2_ATOM)
nil
一个固定uint值.
#define NIL ((~((Uint) 0) << _TAG_IMMED2_SIZE) | _TAG_IMMED2_NIL)
ets
map
flat_map
若size小于MAP_SMALL_MAP_LIMIT(32), 大部分的map都属于flat_map.
erts_produce_heap(factory, 3 + 1 + (2 * n), 0); ERTS_GLB_INLINE Eterm *1 = erts_produce_heap(ErtsHeapFactory* factory, Uint need, Uint xtra) { Eterm* res; ASSERT((unsigned int)factory->mode > (unsigned int)FACTORY_CLOSED); if (factory->hp + need > factory->hp_end) { erts_reserve_heap__(factory, need, xtra); } res = factory->hp; factory->hp += need; return res; }
即分配4+2*n wordsize byte内存.
iex(1)> :erlang.system_info(:wordsize) 8 iex(2)> :erts_debug.flat_size(%{}) 4 iex(3)> :erlang_term.byte_size(%{}) # 这是因为erlang term本身指针有8字节. 加对上的32字节, 共40字节. 40 iex(4)> :erts_debug.flat_size(%{1 => 1}) 6 iex(5)> :erlang_term.byte_size(%{1 => 1}) 56 iex(6)> :erts_debug.flat_size(%{1 => 1, 2 => 2}) 8
hash_map
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。