K6 在 Nebula Graph 上的压测实践

background

For databases, performance testing is a very frequent thing. Optimizing the rules of the query engine, adjusting the parameters of the storage engine, etc., all need to pass performance tests to see the impact of the system in different scenarios.

Even with the same code and the same parameter configuration, there are big differences in different machine resource configurations and different business scenarios. Record the internal stress testing practice process for a reference.

operating system in this article is the x86 architecture CentOS 7.8 .

The machine deploying nebula is configured with 4C 16G memory, SSD disk, and 10G network .

tool

Overview

The data uses ldbc_snb_datagen automatically generated LDBC data set, the overall process is as shown in the figure below.

图片

Deploy the topology, use 1 machine as the stress test load machine, and 3 machines form a nebula cluster

图片

In order to facilitate monitoring, the pressure test load machine is also deployed:

  • Promethues
  • Influxdb
  • Grafana
  • node-exporter

Also deployed on the nebula machine:

  • node-exporter
  • process-exporter

Specific steps

Use nebula-ansible to deploy nebula

  1. Initialize the user first, get through ssh

    1. Log in to 192.168.8.60, 192.168.8.61, 192.168.8.62, 192.168.8.63 respectively, create a vesoft user, join sudoer, and set NOPASSWD.
    2. Log in to 192.168.8.60 and get through ssh
    ssh-keygen
    
    ssh-copy-id vesoft@192.168.8.61
    ssh-copy-id vesoft@192.168.8.62
    ssh-copy-id vesoft@192.168.8.63
  2. Download nebula-ansible, install ansible, modify ansible configuration

    sudo yum install ansible -y
    git clone https://github.com/vesoft-inc/nebula-ansible
    cd nebula-ansible/
    
    # 因为默认是国际 cdn,改为国内的 cdn
    sed -i 's/oss-cdn.nebula-graph.io/oss-cdn.nebula-graph.com.cn/g' group_vars/all.yml

inventory.ini example

[all:vars]
# GA or nightly
install_source_type = GA
nebula_version = 2.0.1
os_version = el7
arc = x86_64
pkg = rpm

packages_dir = {{ playbook_dir }}/packages
deploy_dir = /home/vesoft/nebula
data_dir = {{ deploy_dir }}/data

# ssh user
ansible_ssh_user = vesoft

force_download = False

[metad]
192.168.8.[61:63]

[graphd]
192.168.8.[61:63]

[storaged]
192.168.8.[61:63]
  1. Install and start nebula

    ansible-playbook install.yml
    ansible-playbook start.yml

Deployment monitoring

In order to facilitate deployment and run with Docker-Compose, you need to install Docker and Docker-Compose on the machine first.

Log in to the 192.168.8.60 pressure testing machine

git clone https://github.com/vesoft-inc/nebula-bench.git

cd nebula-bench
cp -r third/promethues ~/.
cp -r third/exporter ~/.



cd ~/exporter/ && docker-compose up -d

cd ~/promethues
# 修改监控节点的 exporter 的地址
# vi prometheus.yml
docker-compose up -d

# 复制 exporter 到 192.168.8.61,192.168.8.62,192.168.8.63,然后启动 docker-compse 

Configure the data source and dashboard of , see 1614d77817282e https://github.com/vesoft-inc/nebula-bench/tree/master/third .

Generate LDBC data set

cd nebula-bench

sudo yum install -y git \
                    make \
                    file \
                    libev \
                    libev-devel \
                    gcc \
                    wget \
                    python3 \
                    python3-devel \
                    java-1.8.0-openjdk \
                    maven

pip3 install --user -r requirements.txt

# 默认生成 sf1, 1G的数据,300w+点,1700w+边
python3 run.py data

# mv 生成好的数据
mv target/data/test_data/ ./sf1

Import Data

cd nebula-bench
# 修改 .evn
cp env .env
vi .env

The following is an example of .env

DATA_FOLDER=sf1
NEBULA_SPACE=sf1
NEBULA_USER=root
NEBULA_PASSWORD=nebula
NEBULA_ADDRESS=192.168.8.61:9669,192.168.8.62:9669,192.168.8.63:9669
#NEBULA_MAX_CONNECTION=100
INFLUXDB_URL=http://192.168.8.60:8086/k6
# 编译 nebula-importer 和 k6
./scripts/setup.sh

# 导入数据
python3 run.py nebula importer

During the import process, you can focus on the following network bandwidth and disk io writing.

图片

图片

Perform stress test

python3 run.py stress run

According to the code in scenarios, js files will be automatically rendered, and then k6 will be used to pressure test all scenes.

After execution, the js file and the pressure test results are in the output folder.

Among them, latency is the latency time returned by the server, and responseTime is the time from initiating execute to receiving by the client, in us.

[vesoft@qa-60 nebula-bench]$ more output/result_Go1Step.json
{
    "metrics": {
        "data_sent": {
            "count": 0,
            "rate": 0
        },
        "checks": {
            "passes": 1667632,
            "fails": 0,
            "value": 1
        },
        "data_received": {
            "count": 0,
            "rate": 0
        },
        "iteration_duration": {
            "min": 0.610039,
            "avg": 3.589942336582023,
            "med": 2.9560145,
            "max": 1004.232905,
            "p(90)": 6.351617299999998,
            "p(95)": 7.997563949999995,
            "p(99)": 12.121579809999997
        },
        "latency": {
            "min": 308,
            "avg": 2266.528722763775,
            "med": 1867,
            "p(90)": 3980,
            "p(95)": 5060,
            "p(99)": 7999
        },
        "responseTime": {
            "max": 94030,
            "p(90)": 6177,
            "p(95)": 7778,
            "p(99)": 11616,
            "min": 502,
            "avg": 3437.376111156418,
            "med": 2831
        },
        "iterations": {
            "count": 1667632,
            "rate": 27331.94978169588
        },
        "vus": {
            "max": 100,
            "value": 100,
            "min": 0
[vesoft@qa-60 nebula-bench]$ head -300 output/output_Go1Step.csv | grep -v USE
timestamp,nGQL,latency,responseTime,isSucceed,rows,errorMsg
1628147822,GO 1 STEP FROM 4398046516514 OVER KNOWS,1217,1536,true,1,
1628147822,GO 1 STEP FROM 2199023262994 OVER KNOWS,1388,1829,true,94,
1628147822,GO 1 STEP FROM 1129 OVER KNOWS,1488,2875,true,14,
1628147822,GO 1 STEP FROM 6597069771578 OVER KNOWS,1139,1647,true,30,
1628147822,GO 1 STEP FROM 2199023261211 OVER KNOWS,1399,2096,true,6,
1628147822,GO 1 STEP FROM 2199023256684 OVER KNOWS,1377,2202,true,4,
1628147822,GO 1 STEP FROM 4398046515995 OVER KNOWS,1487,2017,true,39,
1628147822,GO 1 STEP FROM 10995116278700 OVER KNOWS,837,1381,true,3,
1628147822,GO 1 STEP FROM 933 OVER KNOWS,1130,3422,true,5,
1628147822,GO 1 STEP FROM 6597069771971 OVER KNOWS,1022,2292,true,60,
1628147822,GO 1 STEP FROM 10995116279952 OVER KNOWS,1221,1758,true,3,
1628147822,GO 1 STEP FROM 8796093031179 OVER KNOWS,1252,1811,true,13,
1628147822,GO 1 STEP FROM 10995116279792 OVER KNOWS,1115,1858,true,6,
1628147822,GO 1 STEP FROM 6597069777326 OVER KNOWS,1223,2016,true,4,
1628147822,GO 1 STEP FROM 8796093028089 OVER KNOWS,1361,2054,true,13,
1628147822,GO 1 STEP FROM 6597069777454 OVER KNOWS,1219,2116,true,2,
1628147822,GO 1 STEP FROM 13194139536109 OVER KNOWS,1027,1604,true,2,
1628147822,GO 1 STEP FROM 10027 OVER KNOWS,2212,3016,true,83,
1628147822,GO 1 STEP FROM 13194139544176 OVER KNOWS,855,1478,true,29,
1628147822,GO 1 STEP FROM 10995116280047 OVER KNOWS,1874,2211,true,12,
1628147822,GO 1 STEP FROM 15393162797860 OVER KNOWS,714,1684,true,5,
1628147822,GO 1 STEP FROM 6597069770517 OVER KNOWS,2295,3056,true,7,
1628147822,GO 1 STEP FROM 17592186050570 OVER KNOWS,768,1630,true,26,
1628147822,GO 1 STEP FROM 8853 OVER KNOWS,2773,3509,true,14,
1628147822,GO 1 STEP FROM 19791209307908 OVER KNOWS,1022,1556,true,6,
1628147822,GO 1 STEP FROM 13194139544258 OVER KNOWS,1542,2309,true,91,
1628147822,GO 1 STEP FROM 10995116285325 OVER KNOWS,1901,2556,true,0,
1628147822,GO 1 STEP FROM 6597069774931 OVER KNOWS,2040,3291,true,152,
1628147822,GO 1 STEP FROM 8796093025056 OVER KNOWS,2007,2728,true,29,
1628147822,GO 1 STEP FROM 21990232560726 OVER KNOWS,1639,2364,true,9,
1628147822,GO 1 STEP FROM 8796093030318 OVER KNOWS,2145,2851,true,6,
1628147822,GO 1 STEP FROM 21990232556027 OVER KNOWS,1784,2554,true,5,
1628147822,GO 1 STEP FROM 15393162796879 OVER KNOWS,2621,3184,true,71,
1628147822,GO 1 STEP FROM 17592186051113 OVER KNOWS,2052,2990,true,5,

It is also possible to pressure test a single scene and continuously adjust the configuration parameters for comparison.

Concurrent read

# 执行 go 2 跳,50 并发,持续 300 秒
python3 run.py stress run -scenario go.Go2Step -vu 50 -d 300

INFO[0302] 2021/08/06 03:55:27 [INFO] finish init the pool

     ✓ IsSucceed

     █ setup

     █ teardown

     checks...............: 100.00% ✓ 1559930     ✗ 0
     data_received........: 0 B     0 B/s
     data_sent............: 0 B     0 B/s
     iteration_duration...: min=687.47µs avg=9.6ms       med=8.04ms max=1.03s  p(90)=18.41ms p(95)=22.58ms p(99)=31.87ms
     iterations...........: 1559930 5181.432199/s
     latency..............: min=398      avg=6847.850345 med=5736   max=222542 p(90)=13046   p(95)=16217   p(99)=23448
     responseTime.........: min=603      avg=9460.857877 med=7904   max=226992 p(90)=18262   p(95)=22429   p(99)=31726.71
     vus..................: 50      min=0         max=50
     vus_max..............: 50      min=50        max=50

At the same time, you can observe the various indicators monitored.

图片

checks is to verify whether the request is executed successfully. If the execution fails, the failed error message will be saved in the csv.

awk -F ',' '{print $NF}' output/output_Go2Step.csv|sort |uniq -c
# 执行 go 2 跳,200 并发,持续 300 秒
python3 run.py stress run -scenario go.Go2Step -vu 200 -d 300

INFO[0302] 2021/08/06 04:02:34 [INFO] finish init the pool

     ✓ IsSucceed

     █ setup

     █ teardown

     checks...............: 100.00% ✓ 1866850    ✗ 0
     data_received........: 0 B     0 B/s
     data_sent............: 0 B     0 B/s
     iteration_duration...: min=724.77µs avg=32.12ms      med=25.56ms max=1.03s  p(90)=63.07ms p(95)=84.52ms  p(99)=123.92ms
     iterations...........: 1866850 6200.23481/s
     latency..............: min=395      avg=25280.893558 med=20411   max=312781 p(90)=48673   p(95)=64758    p(99)=97993.53
     responseTime.........: min=627      avg=31970.234329 med=25400   max=340299 p(90)=62907   p(95)=84361.55 p(99)=123750
     vus..................: 200     min=0        max=200
     vus_max..............: 200     min=200      max=200

K6 monitoring data on grafana

图片

Concurrent write

# 执行 insert,200 并发,持续 300 秒,默认 batchSize 100
python3 run.py stress run -scenario go.Go2Step -vu 200 -d 300

You can manually modify the js file to adjust the batchSize

sed -i 's/batchSize = 100/batchSize = 300/g' output/InsertPersonScenario.js

# 手动运行 k6
scripts/k6 run output/InsertPersonScenario.js -u 400 -d 30s --summary-trend-stats "min,avg,med,max,p(90),p(95),p(99)" --summary-export output/result_InsertPersonScenario.json --out influxdb=http://192.168.8.60:8086/k6

When batchSize is 300 and concurrency is 400, an error occurs.

INFO[0032] 2021/08/06 04:03:49 [INFO] finish init the pool

     ✗ IsSucceed
      ↳  96% — ✓ 31257 / ✗ 1103

     █ setup

     █ teardown

     checks...............: 96.59% ✓ 31257       ✗ 1103
     data_received........: 0 B    0 B/s
     data_sent............: 0 B    0 B/s
     iteration_duration...: min=12.56ms avg=360.11ms      med=319.12ms max=2.07s   p(90)=590.31ms p(95)=696.69ms p(99)=958.32ms
     iterations...........: 32360  1028.339207/s
     latency..............: min=4642    avg=206931.543016 med=206162   max=915671  p(90)=320397.4 p(95)=355798.7 p(99)=459521.39
     responseTime.........: min=6272    avg=250383.122188 med=239297.5 max=1497159 p(90)=384190.5 p(95)=443439.6 p(99)=631460.92
     vus..................: 400    min=0         max=400
     vus_max..............: 400    min=400       max=400

awk -F ',' '{print $NF}' output/output_InsertPersonScenario.csv|sort |uniq -c

  31660
   1103  error: E_CONSENSUS_ERROR(-16)."
      1 errorMsg 

It is found that it is E_CONSENSUS_ERROR . When the concurrency is large, the appendlog buffer overflow of raft is over. You can adjust the related parameters.

Summarize

  • Using LDBC as the standard data set, the data characteristics will be standard, and more data such as 1 billion points can be generated, and the data structure is the same.
  • Using k6 as a stress test load tool, binary is more convenient than Jmeter, and because the bottom layer of k6 uses Golang goroutine, it uses fewer resources than Jmeter.
  • Through tools, simulating various scenarios or adjusting nebula parameters, server resources can be better used.

"The Complete Guide to Open Source Distributed Graph Database Nebula Graph", also known as: Nebula small book, which records in detail the knowledge points and specific usage of the graph database and the graph database Nebula Graph. Read the portal: https://docs.nebula -graph.com.cn/site/pdf/NebulaGraph-book.pdf

Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~


NebulaGraph
169 声望686 粉丝

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]


引用和评论

0 条评论