Nebula integration test framework reconstruction based on BDD theory (part 2)

This article was first published in the Nebula Graph public NebulaGraphCommunity , Follow to see the technical practice of large-scale graph database.

基于 BDD 理论的 Nebula 集成测试框架重构(下篇)

In last article , we introduce the evolution of integration testing Nebula Graph. This article introduces the process of adding a use case to the test set and successfully running all test cases.

Environmental preparation

At the beginning of the construction of the 2.0 test framework, we customized some tool classes to help the test framework quickly start and stop a single-node nebula service, including functions such as checking port conflicts and modifying some configuration options. The original execution process is as follows:

  1. Start nebula service through python script;
  2. Call pytest.main execute all test cases concurrently;
  3. Stop nebula's service.

The inconvenience is that when certain parameter options need to be specified for pytest, the parameter needs to be transparently passed to the pytest.main function, and each time a single test case is run, it needs to be cmake by the script generated by 060ded59d0b9b0, which is not very convenient. We want to "execute the test wherever the test case is".

Service start

In the transformation process of this test framework, in addition to changing the program entry, most of the original encapsulated logic was reused. Since nebula has accumulated a lot of use cases at present, single-process operation can no longer meet the needs of rapid iteration. After trying other parallel plug-ins, considering compatibility, we finally chose the pytest-xdist plug-in to speed up the entire test process.

But pytest only provides four scope fixtures: session, module, class and function. And we hope to use a global-level fixture to complete the startup and initialization of the nebula service. At present, the highest level of session level is still executed once for each runner, so if there are 8 runners, 8 nebula services must be started, which is not what we expect.

Refer to the document pytest-xdist, and file locks are required for parallel control between different runners. In order to make the control logic simple enough, we separate the program start-stop and preparation logic from the process of executing the test, and use a separate step to control the start of nebula. When some tests have problems, you can also connect to the test separately through the nebula-console Services, further verification and debugging.

data import

Prior to this, Nebula's data import process was to directly execute a spliced nGQL INSERT statement. In doing so, there are the following problems:

  1. In the case of a large test data set, the INSERT statement will become lengthy and the client execution will time out;
  2. It is not easy to expand a new test data set, and it is necessary to construct a ready-made csv data file into a corresponding nGQL sentence file;
  3. The same data set cannot be reused. For example, if you want to import the same csv into a space of different VID types for testing, you need to construct different INSERT statements.

To solve the above problems, refer to the nebula-importer . We completely separate the imported logic from the data set, and re-implement the python version of the import module. However, currently only supports importing csv type data files, and each csv file can only store one tag/edge type.

After refactoring the imported logic, the current test data set of nebula becomes clear:

├── basketballplayer
│   ├── bachelor.csv
│   ├── config.yaml
│   ├── like.csv
│   ├── player.csv
│   ├── serve.csv
│   ├── team.csv
│   └── teammate.csv
├── basketballplayer_int_vid
│   └── config.yaml
└── student
    ├── config.yaml
    ├── is_colleagues.csv
    ├── is_friend.csv
    ├── is_schoolmate.csv
    ├── is_teacher.csv
    ├── person.csv
    ├── student.csv
    └── teacher.csv
3 directories, 16 files

Each directory contains all the csv data files in a space. The description of each file and the detailed information of the space are configured config.yaml Through this configuration information, we also realized that basketballplayer and basketballplayer _int_vid two spaces share the same data. If you want to add a new test data set in the future, just add a data directory similar to basketballplayer. config.yaml the specific content of repo .

Installation dependencies

In addition to the commonly used pytest and nebula-python libraries, the current testing framework also uses plug-ins such as pytest-bdd and pytest-xdist. In addition, in order to better uniformly add the format of the test case feature file, we introduced the community's reformat-gherkin tool, and based on this, we part of the format to maintain the same format as the openCypher TCK feature file.

Currently, the two plug-ins of nebula-python and reformat-gherkin are installed directly through the source code. We provide nebula-graph/tests Makefile to simplify the user's operation process. All environment preparations for performing the test only need to execute the command:

$ cd nebula-graph/tests && make init-all

We have also integrated the above format check into the CI process of GitHub Action. If the format of the test file modified by the user is not as expected, the local formatting make fmt

Write use cases

As mentioned in the previous article, now Nebula's integration test has become a "black box" test. Users no longer need to care about how to call the statements they write, and what functions to call are more in line with the expected results. Just follow the agreed specifications and describe your own use cases in the feature file in a manner similar to "natural language". The following is an example of a test case:

Feature: Variable length pattern match (m to n)
  Scenario: both direction expand with properties
    Given a graph with space named "basketballplayer"
    When executing query:
      MATCH (:player{name:"Tim Duncan"})-[e:like*2..3{likeness: 90}]-(v)
      RETURN e, v
    Then the result should be, in any order, with relax comparison:
      | e                                                                                  | v                  |
      | [[:like "Tim Duncan"<-"Manu Ginobili"], [:like "Manu Ginobili"<-"Tiago Splitter"]] | ("Tiago Splitter") |

Given provides the initial conditions of the test. Here, a space named "basketballplayer" is initialized. When describes the input of the test, that is, the nGQL statement. Then gives the expected result and the expected comparison method, here is the result in the disorderly and loose comparison table.

Feature file structure

Feature file is a file format described by Gherkin language, mainly composed of the following parts:

  • Feature: You can add the "Title" of the current file, or describe the content of the file in detail;
  • Background: Steps commonly used in subsequent scenarios;
  • Scenario: describe the scenario of each test case step by step;
  • Examples: The test scenario and test data can be further separated, simplifying the writing of Scenarios in the current Feature file;

Each scenario is divided into different steps, and each step has a special meaning:

  • Given: Set the initial conditions of the current test scenario, the above Background can only contain steps of the Given type;
  • When: Given the input of the current test scenario;
  • Then: Describe the expected result after completing the When step;
  • And: You can follow any of the above steps in Given/When/Then to further supplement the above step actions;
  • Examples: Similar to the description of the above Examples, but the scope of action is limited to a single scenario and does not affect other scenario tests in the same Feature file.


From the above description, it can be seen that Scenario is composed of steps. Nebula has customized some unique steps based on the steps compatible with openCypher TCK to facilitate the writing of test cases:

  1. Given a graph with space named "basketballplayer" : Use the space that imported the "basketballplayer" data in advance;
  2. creating a new space with following options : Create a new space with the following parameters, you can specify the name, partition_num, replica_factor, vid_type, charset, and collate parameters;
  3. load "basketballplayer" csv data to a new space : Import the "basketballplayer" data set to the new space;
  4. profiling query PROFILE on the query statement, and the returned result will contain the execution plan;
  5. wait 3 seconds : Wait for 3 seconds. In schema-related operations, a certain amount of data synchronization time is often required. This step can be used at this time;
  6. define some list variables : Define some variables to represent the List type with many elements, so that it is convenient to write the corresponding List in the expected result;
  7. the result should be, in any order, with relax comparison : Perform unordered loose comparison of the execution result, which means that what the user writes in the expected result will be compared, and the unwritten part will not be compared even if there is in the returned result;
  8. the result should contain : The returned result must contain the expected result;
  9. the execution plan should be : Compare the execution plan in the returned result.

In addition to the above steps, more steps can be defined as needed to accelerate the development of test cases.


According to the description TCK, openCypher defines a set of graph semantic representations to express the expected return results. The format of these dots draws on the MATCH , so if you are familiar with the query of openCypher, you can basically understand the results in the TCK test scenario easily. For example, the format of partial graph semantics is as follows:

  1. Point description: (:L {p:1, q:"string"}) ;
  2. Side description: [:T {p:0, q:"string"}] ;
  3. Description of the path: <(:L)-[:T]->(:L2)> .

However, there are still some differences between Nebula Graph and Neo4J in the graph model. For example, each Tag in Nebula Graph can have its own attributes, so according to the existing way of expression, it is impossible to describe a vertex containing multiple tags with attributes. There are also differences in the representation of edges. The Edge Key of Nebula Graph is composed of a quadruple of <src, type, rank, dst> , and the existing representation cannot describe the values of src, dst, and rank of edges. Therefore, after considering these differences, we expanded the existing TCK expected results expression:

  1. Point description supports multiple tags with attributes: ("VID" :T1{p:0} :T2{q: "string"}) ;
  2. The description of the edge supports the representation of src, dst and rank: [:type "src"->"dst"@rank {p:0, q:"string"}] ;
  3. The path is just the representation of the above points and edges, the same as TCK.

Through the expansion of the point-by-side description described above, it is compatible with TCK's existing use cases and fits the design of Nebula Graph. After solving the problem of expression, the next problem faced is how to efficiently and without error transform the above expression into a specific data structure, so as to be able to compare with the real query results. After considering regular matching, parser parsing and other solutions, we chose to construct a parser to process these strings with specific grammatical rules. The advantages of this are as follows:

  1. According to specific grammatical rules, the parsed AST can be made to conform to the data structure of the query return result. When the two are compared, it is the verification of the specific fields in the specific structure;
  2. Avoid processing complex regular matching strings and reduce parsing errors;
  3. Can support other string parsing requirements, such as regular expressions, lists, collections, etc.

With the help of two libraries, ply.yacc and ply.lex, we can use a small amount of code to achieve the above complex requirements. For specific implementation, see file .

Test process

The current testing process becomes:

1) Write Feature file

At present, all feature use cases of Nebula Graph are located in the tests/tck/features directory in the repo.

2) Start nebula graph service

$ cd /path/to/nebula-graph/tests
$ make up # 启动 nebula graph 服务

3) Perform tests locally

$ make fmt # 格式化
$ make tck # 执行 TCK 测试

4) Stop nebula graph service

$ mak
e down


When the written use case needs to be debugged, you can use the methods supported by pytest for further debugging, such as re-running the use case that failed in the last process:

$ pytest --last-failed tck/ # 运行 tck 目录中上次执行失败的用例
$ pytest -k "match" tck/    # 执行含有 match 字段的用例

You can also mark a specific scenario in the feature file, and only run the marked use case, such as:

# in feature file
  Scenario: both direction expand with properties
    Given a graph with space named "basketballplayer"
# in nebula-graph/tests directory
$ pytest -m "testmark" tck/ # 运行带有 testmark 标记的测试用例

to sum up

Standing on the shoulders of our predecessors allowed us to find a more suitable test solution for Nebula Graph. We would also like to thank all the open source tools and projects mentioned in this article.

In the process of practicing pytest-bdd, I also found some imperfections, such as its compatibility with pytest-xdist and other plugins (gherkin-reporter), and pytest does not natively provide global scope-level fixtures. But in the end, the benefits it brings to Nebula Graph far outweigh these difficulties.

In the previous article, it was mentioned that there is no need for users to program. It is not a virtual imagination. When we fix the above mode, we can develop a set of scaffolding to add test cases, allowing users to "fill in the blanks" on the page and automatically generate the corresponding Feature test file, so that it can be further convenient for users, here can be left to interested community users to try.

Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~

Want to exchange graph database technology with other big companies? The NUC 2021 conference is waiting for you to communicate: NUC 2021 registration portal

Nebula 的图数据库世界
介绍图数据库和 Nebula 的一切

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]

157 声望
671 粉丝
0 条评论
用图技术搞定附近好友、时空交集等 7 个典型社交网络应用
两个月之前,我的同事拿了一张推特的互动关系图(下图,由 STRRL 授权)来问我能不能搞一篇图技术来探索社交互动关系的文章,看看这些图是如何通过技术实现的。

NebulaGraph阅读 56

花了几个月时间把 MySQL 重新巩固了一遍,梳理了一篇几万字 “超硬核” 的保姆式学习教程!(持续更新中~)
MySQL 是最流行的关系型数据库管理系统,在 WEB 应用方面 MySQL 是最好的 RDBMS(Relational Database Management System:关系数据库管理系统)应用软件之一。

民工哥11阅读 1k


骑牛上青山8阅读 2.2k评论 2


九旬6阅读 618

又一款内存数据库横空出世,比 Redis 更强,性能直接飙升一倍!杀疯了
KeyDB是Redis的高性能分支,专注于多线程,内存效率和高吞吐量。除了多线程之外,KeyDB还具有仅在Redis Enterprise中可用的功能,例如Active Replication,FLASH存储支持以及一些根本不可用的功能,例如直接备份...

民工哥4阅读 674评论 1


京东云开发者2阅读 916

使用内存对齐机制优化结构体性能,妙啊!前言之前分享过2篇结构体文章:10秒改struct性能直接提升15%,产品姐姐都夸我好棒 和 Go语言空结构体这3种妙用,你知道吗? 得到了大家的好评。这篇继续分享进阶内容:结...

王中阳Go4阅读 1.6k评论 2


NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]

157 声望
671 粉丝