Nebula integration test framework reconstruction based on BDD theory (part 2)

NebulaGraph
中文
This article was first published in the Nebula Graph public NebulaGraphCommunity , Follow to see the technical practice of large-scale graph database.

基于 BDD 理论的 Nebula 集成测试框架重构(下篇)

In last article , we introduce the evolution of integration testing Nebula Graph. This article introduces the process of adding a use case to the test set and successfully running all test cases.

Environmental preparation

At the beginning of the construction of the 2.0 test framework, we customized some tool classes to help the test framework quickly start and stop a single-node nebula service, including functions such as checking port conflicts and modifying some configuration options. The original execution process is as follows:

  1. Start nebula service through python script;
  2. Call pytest.main execute all test cases concurrently;
  3. Stop nebula's service.

The inconvenience is that when certain parameter options need to be specified for pytest, the parameter needs to be transparently passed to the pytest.main function, and each time a single test case is run, it needs to be cmake by the script generated by 060ded59d0b9b0, which is not very convenient. We want to "execute the test wherever the test case is".

Service start

In the transformation process of this test framework, in addition to changing the program entry, most of the original encapsulated logic was reused. Since nebula has accumulated a lot of use cases at present, single-process operation can no longer meet the needs of rapid iteration. After trying other parallel plug-ins, considering compatibility, we finally chose the pytest-xdist plug-in to speed up the entire test process.

But pytest only provides four scope fixtures: session, module, class and function. And we hope to use a global-level fixture to complete the startup and initialization of the nebula service. At present, the highest level of session level is still executed once for each runner, so if there are 8 runners, 8 nebula services must be started, which is not what we expect.

Refer to the document pytest-xdist, and file locks are required for parallel control between different runners. In order to make the control logic simple enough, we separate the program start-stop and preparation logic from the process of executing the test, and use a separate step to control the start of nebula. When some tests have problems, you can also connect to the test separately through the nebula-console Services, further verification and debugging.

data import

Prior to this, Nebula's data import process was to directly execute a spliced nGQL INSERT statement. In doing so, there are the following problems:

  1. In the case of a large test data set, the INSERT statement will become lengthy and the client execution will time out;
  2. It is not easy to expand a new test data set, and it is necessary to construct a ready-made csv data file into a corresponding nGQL sentence file;
  3. The same data set cannot be reused. For example, if you want to import the same csv into a space of different VID types for testing, you need to construct different INSERT statements.

To solve the above problems, refer to the nebula-importer . We completely separate the imported logic from the data set, and re-implement the python version of the import module. However, currently only supports importing csv type data files, and each csv file can only store one tag/edge type.

After refactoring the imported logic, the current test data set of nebula becomes clear:

nebula-graph/tests/data
├── basketballplayer
│   ├── bachelor.csv
│   ├── config.yaml
│   ├── like.csv
│   ├── player.csv
│   ├── serve.csv
│   ├── team.csv
│   └── teammate.csv
├── basketballplayer_int_vid
│   └── config.yaml
└── student
    ├── config.yaml
    ├── is_colleagues.csv
    ├── is_friend.csv
    ├── is_schoolmate.csv
    ├── is_teacher.csv
    ├── person.csv
    ├── student.csv
    └── teacher.csv
 
3 directories, 16 files

Each directory contains all the csv data files in a space. The description of each file and the detailed information of the space are configured config.yaml Through this configuration information, we also realized that basketballplayer and basketballplayer _int_vid two spaces share the same data. If you want to add a new test data set in the future, just add a data directory similar to basketballplayer. config.yaml the specific content of repo .

Installation dependencies

In addition to the commonly used pytest and nebula-python libraries, the current testing framework also uses plug-ins such as pytest-bdd and pytest-xdist. In addition, in order to better uniformly add the format of the test case feature file, we introduced the community's reformat-gherkin tool, and based on this, we part of the format to maintain the same format as the openCypher TCK feature file.

Currently, the two plug-ins of nebula-python and reformat-gherkin are installed directly through the source code. We provide nebula-graph/tests Makefile to simplify the user's operation process. All environment preparations for performing the test only need to execute the command:

$ cd nebula-graph/tests && make init-all

We have also integrated the above format check into the CI process of GitHub Action. If the format of the test file modified by the user is not as expected, the local formatting make fmt

Write use cases

As mentioned in the previous article, now Nebula's integration test has become a "black box" test. Users no longer need to care about how to call the statements they write, and what functions to call are more in line with the expected results. Just follow the agreed specifications and describe your own use cases in the feature file in a manner similar to "natural language". The following is an example of a test case:

Feature: Variable length pattern match (m to n)
  Scenario: both direction expand with properties
    Given a graph with space named "basketballplayer"
    When executing query:
      """
      MATCH (:player{name:"Tim Duncan"})-[e:like*2..3{likeness: 90}]-(v)
      RETURN e, v
      """
    Then the result should be, in any order, with relax comparison:
      | e                                                                                  | v                  |
      | [[:like "Tim Duncan"<-"Manu Ginobili"], [:like "Manu Ginobili"<-"Tiago Splitter"]] | ("Tiago Splitter") |

Given provides the initial conditions of the test. Here, a space named "basketballplayer" is initialized. When describes the input of the test, that is, the nGQL statement. Then gives the expected result and the expected comparison method, here is the result in the disorderly and loose comparison table.

Feature file structure

Feature file is a file format described by Gherkin language, mainly composed of the following parts:

  • Feature: You can add the "Title" of the current file, or describe the content of the file in detail;
  • Background: Steps commonly used in subsequent scenarios;
  • Scenario: describe the scenario of each test case step by step;
  • Examples: The test scenario and test data can be further separated, simplifying the writing of Scenarios in the current Feature file;

Each scenario is divided into different steps, and each step has a special meaning:

  • Given: Set the initial conditions of the current test scenario, the above Background can only contain steps of the Given type;
  • When: Given the input of the current test scenario;
  • Then: Describe the expected result after completing the When step;
  • And: You can follow any of the above steps in Given/When/Then to further supplement the above step actions;
  • Examples: Similar to the description of the above Examples, but the scope of action is limited to a single scenario and does not affect other scenario tests in the same Feature file.

    Steps

From the above description, it can be seen that Scenario is composed of steps. Nebula has customized some unique steps based on the steps compatible with openCypher TCK to facilitate the writing of test cases:

  1. Given a graph with space named "basketballplayer" : Use the space that imported the "basketballplayer" data in advance;
  2. creating a new space with following options : Create a new space with the following parameters, you can specify the name, partition_num, replica_factor, vid_type, charset, and collate parameters;
  3. load "basketballplayer" csv data to a new space : Import the "basketballplayer" data set to the new space;
  4. profiling query PROFILE on the query statement, and the returned result will contain the execution plan;
  5. wait 3 seconds : Wait for 3 seconds. In schema-related operations, a certain amount of data synchronization time is often required. This step can be used at this time;
  6. define some list variables : Define some variables to represent the List type with many elements, so that it is convenient to write the corresponding List in the expected result;
  7. the result should be, in any order, with relax comparison : Perform unordered loose comparison of the execution result, which means that what the user writes in the expected result will be compared, and the unwritten part will not be compared even if there is in the returned result;
  8. the result should contain : The returned result must contain the expected result;
  9. the execution plan should be : Compare the execution plan in the returned result.

In addition to the above steps, more steps can be defined as needed to accelerate the development of test cases.

Parser

According to the description TCK, openCypher defines a set of graph semantic representations to express the expected return results. The format of these dots draws on the MATCH , so if you are familiar with the query of openCypher, you can basically understand the results in the TCK test scenario easily. For example, the format of partial graph semantics is as follows:

  1. Point description: (:L {p:1, q:"string"}) ;
  2. Side description: [:T {p:0, q:"string"}] ;
  3. Description of the path: <(:L)-[:T]->(:L2)> .

However, there are still some differences between Nebula Graph and Neo4J in the graph model. For example, each Tag in Nebula Graph can have its own attributes, so according to the existing way of expression, it is impossible to describe a vertex containing multiple tags with attributes. There are also differences in the representation of edges. The Edge Key of Nebula Graph is composed of a quadruple of <src, type, rank, dst> , and the existing representation cannot describe the values of src, dst, and rank of edges. Therefore, after considering these differences, we expanded the existing TCK expected results expression:

  1. Point description supports multiple tags with attributes: ("VID" :T1{p:0} :T2{q: "string"}) ;
  2. The description of the edge supports the representation of src, dst and rank: [:type "src"->"dst"@rank {p:0, q:"string"}] ;
  3. The path is just the representation of the above points and edges, the same as TCK.

Through the expansion of the point-by-side description described above, it is compatible with TCK's existing use cases and fits the design of Nebula Graph. After solving the problem of expression, the next problem faced is how to efficiently and without error transform the above expression into a specific data structure, so as to be able to compare with the real query results. After considering regular matching, parser parsing and other solutions, we chose to construct a parser to process these strings with specific grammatical rules. The advantages of this are as follows:

  1. According to specific grammatical rules, the parsed AST can be made to conform to the data structure of the query return result. When the two are compared, it is the verification of the specific fields in the specific structure;
  2. Avoid processing complex regular matching strings and reduce parsing errors;
  3. Can support other string parsing requirements, such as regular expressions, lists, collections, etc.

With the help of two libraries, ply.yacc and ply.lex, we can use a small amount of code to achieve the above complex requirements. For specific implementation, see nbv.py file .

Test process

The current testing process becomes:

1) Write Feature file

At present, all feature use cases of Nebula Graph are located in the tests/tck/features directory in the github.com/vesoft-inc/nebula-graph repo.

2) Start nebula graph service

$ cd /path/to/nebula-graph/tests
$ make up # 启动 nebula graph 服务

3) Perform tests locally

$ make fmt # 格式化
$ make tck # 执行 TCK 测试

4) Stop nebula graph service

$ mak
e down

debugging

When the written use case needs to be debugged, you can use the methods supported by pytest for further debugging, such as re-running the use case that failed in the last process:

$ pytest --last-failed tck/ # 运行 tck 目录中上次执行失败的用例
$ pytest -k "match" tck/    # 执行含有 match 字段的用例

You can also mark a specific scenario in the feature file, and only run the marked use case, such as:

# in feature file
  @testmark
  Scenario: both direction expand with properties
    Given a graph with space named "basketballplayer"
    ...
 
# in nebula-graph/tests directory
$ pytest -m "testmark" tck/ # 运行带有 testmark 标记的测试用例

to sum up

Standing on the shoulders of our predecessors allowed us to find a more suitable test solution for Nebula Graph. We would also like to thank all the open source tools and projects mentioned in this article.

In the process of practicing pytest-bdd, I also found some imperfections, such as its compatibility with pytest-xdist and other plugins (gherkin-reporter), and pytest does not natively provide global scope-level fixtures. But in the end, the benefits it brings to Nebula Graph far outweigh these difficulties.

In the previous article, it was mentioned that there is no need for users to program. It is not a virtual imagination. When we fix the above mode, we can develop a set of scaffolding to add test cases, allowing users to "fill in the blanks" on the page and automatically generate the corresponding Feature test file, so that it can be further convenient for users, here can be left to interested community users to try.

Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~

Want to exchange graph database technology with other big companies? The NUC 2021 conference is waiting for you to communicate: NUC 2021 registration portal

阅读 546

Nebula 的图数据库世界
介绍图数据库和 Nebula 的一切

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]

118 声望
661 粉丝
0 条评论

NebulaGraph:一个开源的分布式图数据库。欢迎来 GitHub 交流:[链接]

118 声望
661 粉丝
文章目录
宣传栏