erlang的vm层profiling工具

简介

profiling工具是语言生态的重要组成部分。这篇blog对erlang的vm层profiling做一个总结。

profiling tools overview

名字	功能	剩余性能）
cprof	统计模块的调用次数，可找出调用次数热点	0.67
eprof	统计模块的调用次数和时间，可找出调用时间、次数热点	0.17
fprof	统计模块调用次数和关系，可生成call-graph	0.008
eflame	通过erlang trace采集调用栈，进程状态	0.003
looking_glass	也是基于erlang trace	0.004

详细数据

压测代码如下：

  defp do_sth(:cpu_intensive) do
    start_time = :erlang.system_time(:microsecond)
    end_time = start_time + 10_000_000
    do_until_time(0, start_time, end_time)
  end

  def do_until_time(count, start_time, end_time) do
    now = :erlang.system_time(:microsecond)

    if now >= end_time do
      Logger.warn(
        "\ntotal_time: #{div(end_time - start_time, 1000_000)} second\ncount: #{count}, avg_μs: #{div(end_time - start_time, count)}"
      )
    else
      for v <- 1..100 do
        a = %{v => Base.encode64(:crypto.strong_rand_bytes(20))}
        b = Jason.encode!(a)
        Jason.decode!(b)
      end

      do_until_time(count + 1, start_time, end_time)
    end
  end

无压测

性能基准

total_time: 10 second
count: 24287, avg_μs: 411
cpu: %100

cprof

示例代码

    :cprof.start()
    do_sth(:cpu_intensive)
    :cprof.pause()
    data = :cprof.analyse()
    Logger.warn("cprof_result:\n#{inspect(data, pretty: true)}")

输出结果

仅能获得模块的调用次数。

{243741614,
 [
   {Jason.Encode, 75685344,
    [
      {{Jason.Encode, :escape_json_chunk, 5}, 49314144},
      {{Jason.Encode, :value, 3}, 3296400},
      {{Jason.Encode, :escape_json, 4}, 3296400},
      {{Jason.Encode, :escape_json, 3}, 3296400},
      {{Jason.Encode, :"-fun.escape_json/3-", 3}, 3296400},
      {{Jason.Encode, :map_naive_loop, 3}, 1648200},
      {{Jason.Encode, :map_naive, 3}, 1648200},
      {{Jason.Encode, :key, 2}, 1648200},
      {{Jason.Encode, :escape_function, 1}, 1648200},
      {{Jason.Encode, :encode_string, 2}, 1648200},
      {{Jason.Encode, :encode_map_function, 1}, 1648200},
      {{Jason.Encode, :encode, 2}, 1648200},
      {{Jason.Encode, :"-fun.map_naive/3-", 3}, 1648200}
    ]},
   {Jason.Decoder, 75685344,
    [
      {{Jason.Decoder, :string, 6}, 52610544},
      {{Jason.Decoder, :value, 5}, 3296400},
      {{Jason.Decoder, :"-string_decode_function/1-fun-0-", 1}, 3296400},
      {{Jason.Decoder, :terminate, 6}, 1648200},
      {{Jason.Decoder, :string_decode_function, 1}, 1648200},
      {{Jason.Decoder, :parse, 2}, 1648200},
      {{Jason.Decoder, :object_decode_function, 1}, 1648200},
      {{Jason.Decoder, :object, 6}, 1648200},
      {{Jason.Decoder, :key_decode_function, 1}, 1648200},
      {{Jason.Decoder, :key, 6}, 1648200},
      {{Jason.Decoder, :key, 5}, 1648200},
      {{Jason.Decoder, :float_decode_function, 1}, 1648200},
      {{Jason.Decoder, :"-key_decode_function/1-fun-0-", 1}, 1648200}
    ]}

性能

total_time: 10 second
count: 16482, avg_μs: 606
cpu: %100

eprof

示例代码

    :eprof.start()
    :eprof.start_profiling(:erlang.processes())
    do_sth(:cpu_intensive)
    :eprof.stop_profiling()
    data = :eprof.analyze(:total, sort: :time)
    Logger.warn("eprof_result:\n#{inspect(data, pretty: true)}")

输出结果

统计了调用时间。按时间占比排序是个较实用的选项。

FUNCTION                                                         CALLS        %     TIME  [uS / CALLS]
--------                                                         -----  -------     ----  [----------]
gen:call/4                                                           1     0.00        0  [      0.00]
gen:do_for_proc/2                                                    1     0.00        0  [      0.00]
...
maps:from_list/1                                               1249500     2.04   203667  [      0.16]
'Elixir.Enum':into_map/2                                        833000     2.38   237244  [      0.28]
'Elixir.ProfilingTest':'-do_until_time/3-fun-0-'/2              416500     2.39   238687  [      0.57]
'Elixir.Jason.Decoder':parse/2                                  416500     2.41   240746  [      0.58]
'Elixir.Base':do_encode64/2                                     416500     2.65   264394  [      0.63]
'Elixir.Base':'-do_encode64/2-lbc$^0/2-0-'/2                   1666000     7.30   728723  [      0.44]
'Elixir.Base':enc64_pair/1                                     5831000     7.39   737817  [      0.13]
crypto:strong_rand_bytes_nif/1                                  416500     7.56   754244  [      1.81]
'Elixir.Jason.Encode':escape_json_chunk/5                     12461680    12.19  1216586  [      0.10]
'Elixir.Jason.Decoder':string/6                               13294680    13.93  1390338  [      0.10]
------------------------------------------------------------  --------  -------  -------  [----------]
Total:                                                        61597835  100.00%  9981650  [      0.16]

性能

total_time: 10 second
count: 4165, avg_μs: 2400
cpu: %100

fprof

示例代码

    :fprof.trace([:start, {:procs, :erlang.processes()}])
    do_sth(:cpu_intensive)
    # 默认写入fprof.trace文件
    :fprof.trace(:stop)
    # 读取文件并分析
    :fprof.profile()
    # 输出结果，按函数自身调用时间排序，不包含调用其他函数的时间
    :fprof.analyse(sort: :own)
    # 按累计时间排序，包括调用其他函数的时间
    :fprof.analyse(sort: :acc)

输出结果

kcallgrind

call graph生成见：https://segmentfault.com/a/11...
fprof call graph

sort own

Processing data...
Creating output...
%% Analysis results:
{  analysis_options,
 [{callers, true},
  {sort, own},
  {totals, false},
  {details, true}]}.

%                                               CNT       ACC       OWN        
[{ totals,                                     5040222,10053.041, 9982.294}].  %%%


%                                               CNT       ACC       OWN        
[{ "<0.212.0>",                                5034485,undefined, 9948.017}].   %%

{[{{'Elixir.Jason.Decoder',string,6},          1011296,    0.000, 1539.664},      
  {{'Elixir.Jason.Decoder',key,5},             33800, 2135.166,   47.418},      
  {{'Elixir.Jason.Decoder',value,5},           33800,    0.000,   46.437}],     
 { {'Elixir.Jason.Decoder',string,6},          1078896, 2135.166, 1633.519},     %
 [{{'Elixir.Jason.Decoder',string,6},          1011296,    0.000, 1539.664},      
  {{'Elixir.Jason.Decoder',object,6},          33800,  303.359,  140.051},      
  {{'Elixir.Jason.Decoder','-string_decode_function/1-fun-0-',1},67600,   85.381,   85.381},      
  {{'Elixir.Jason.Decoder',key,6},             33800, 1899.845,   46.730},      
  {garbage_collect,                            4858,   18.554,   18.554}]}.    

{[{{'Elixir.Jason.Encode',escape_json_chunk,5},943696,    0.000, 1351.146},      
  {{'Elixir.Jason.Encode',escape_json,4},      67600, 1487.645,   94.579}],     
 { {'Elixir.Jason.Encode',escape_json_chunk,5},1011296, 1487.645, 1445.725},     %
 [{{'Elixir.Jason.Encode',escape_json_chunk,5},943696,    0.000, 1351.146},      
  {garbage_collect,                            8224,   41.920,   41.920}]}.

sort acc

Processing data...
Creating output...
%% Analysis results:
{  analysis_options,
 [{callers, true},
  {sort, acc},
  {totals, false},
  {details, true}]}.

%                                               CNT       ACC       OWN        
[{ totals,                                     5040222,10053.041, 9982.294}].  %%%


%                                               CNT       ACC       OWN        
[{ "<0.212.0>",                                5034485,undefined, 9948.017}].   %%

{[{{'Elixir.ProfilingTest','test fprof',1},       0,10051.624,    0.002},      
  {{fprof,call,1},                                1,    1.377,    0.004},      
  {undefined,                                     0,    0.033,    0.028}],     
 { {fprof,just_call,2},                           1,10053.034,    0.034},     %
 [{{'Elixir.ProfilingTest',do_sth,1},             1,10051.622,    0.003},      
  {suspend,                                       1,    1.368,    0.000},      
  {{erlang,monitor,2},                            1,    0.005,    0.005},      
  {{erlang,demonitor,1},                          1,    0.005,    0.005}]}.    

{[{undefined,                                     0,10053.008,    0.001}],     
 { {'Elixir.ProfilingTest','test fprof',1},       0,10053.008,    0.001},     %
 [{{fprof,just_call,2},                           0,10051.624,    0.002},      
  {{fprof,trace,1},                               1,    1.383,    0.002}]}.    

{[{{fprof,just_call,2},                           1,10051.622,    0.003}],     
 { {'Elixir.ProfilingTest',do_sth,1},             1,10051.622,    0.003},     %
 [{{'Elixir.ProfilingTest',do_until_time,3},      1,10051.617,    0.028},      
  {{erlang,system_time,1},                        1,    0.002,    0.002}]}.

性能

11:47:51.914 [warn]  
total_time: 10 second
count: 341, avg_μs: 29325
cpu: %160

eflame

示例代码

    spawn(fn -> do_sth(:cpu_intensive) end)
    :eflame2.write_trace(:global_and_local_calls, './eflame.test', :all, 15 * 1000)
    # 下面的语句需要等待一段时间
    :eflame2.format_trace('./eflame.test', './eflame.test.out')

完成输出：

Hello, world, I'm <0.821.0> and I'm running....
<0.230.0> <0.228.0> iex(2)> 
nil
<0.50.0> <0.10.0> <0.194.0> <0.64.0>         

Writing to ./eflame.test.out for 8 processes... finished!

输出

生成火焰图：

~/git/eflame/flamegraph.riak-color.pl eflame.test.out > eflame.svg

性能

total_time: 10 second
count: 91, avg_μs: 109890
cpu: %120

looking_grass

代码

    # 不限制采集范围会直接爆内存
    :lg.trace(
      [{:scope, [self()]}],
      :lg_file_tracer,
      'traces.lz4',
      %{process_dump: true}
    )

    do_sth(:cpu_intensive)
    :lg.stop()
    :lg_flame.profile_many('traces.lz4.*', 'output')

输出

显然丢失了一些调用栈。暂未查明原因。

~/git/FlameGraph/flamegraph.pl output > looking_grass.svg

性能

total_time: 10 second
count: 127, avg_μs: 78740

总结

erlang 产品中可用的profiling工具非常少，仅 cprof 消耗较少，eprof也需要谨慎开启。通过 trace pattern 减少对性能的影响.
erlang 多进程，火焰图不适合作为分析 cpu 消耗的工具，可用于分析单个进程场景。
没有堪用的火焰图工具，实现太粗糙（没有采样，没有对中间结果做处理），内存，CPU消耗都过大。

参考

https://www.erlang.org/doc/ef...
https://interscity.org/assets...
https://github.com/slfritchie...
https://github.com/rabbitmq/l...

erlang的vm层profiling工具

简介

profiling tools overview

详细数据

无压测

性能基准

cprof

示例代码

输出结果

性能

eprof

示例代码

输出结果

性能

fprof

示例代码

输出结果

kcallgrind

sort own

sort acc

性能

eflame

示例代码

输出

性能

looking_grass

代码

输出

性能

总结

参考

enjolras1205

引用和评论

从dama跳棋ai比赛说起