Editor's note: This article details the workflow, implementation details of the Milvus 2.0 quality assurance system, and optimizations to improve efficiency.
Content outline:
- Quality Assurance General Introduction to Test Content Concerns How the Development Team and Quality Assurance Team Work Together
Issue management process release standard - Introduction to testing modulesGeneral introductionUnit testingFunctional testingDeployment testingReliability testingStability and performance testing
- Efficiency methods and tools
Github Action
General introduction to quality assurance of performance testing tools
General introduction to quality assurance
The architectural design diagram is also important for quality assurance. Only by fully understanding the object under test can a more reasonable and efficient test plan be formulated. Milvus 2.0 is a cloud-native and distributed architecture. The main entrance is through the SDK, and there are many layered logics inside. Therefore, for users, the SDK side is a very worthy part of attention. When testing Milvus, the SDK side will be functionally tested first, and the possible problems within Milvus will be found through the SDK. At the same time Milvus is also a database, so various system tests on the database will also be involved.
A cloud-native, distributed architecture brings benefits to testing but also introduces challenges. The advantage is that, different from the traditional local deployment and operation, deploying and running in the k8s cluster can ensure that the software environment is as consistent as possible during development and testing. The challenge is that the distributed system becomes complex, introduces more uncertainty, and increases the test workload and difficulty. For example, after the microservice is implemented, the number of services will increase, the number of nodes of the machine will increase, and the more intermediate stages, the greater the possibility of errors, and each situation needs to be considered when testing.
Test content focus
According to the nature of the product and the needs of users, the content and priority of Milvus tests are shown in the figure below.
First of all, on the function (Function), pay attention to whether the interface is in line with the design expectations.
Secondly, on the deployment (Deployment), pay attention to whether the restart and upgrade can be successful in standalone or cluster mode.
Third, in terms of performance, because it is a real-time analysis database that integrates streams and batches, it will pay more attention to speed, and will pay more attention to the performance of inserting, indexing, and querying.
Fourth, on Stability, pay attention to the uptime of Milvus under normal load, and the expected target is 5-10 days.
The fifth is reliability (Reliability), pay attention to the performance of Milvus when the error occurs, and whether the error elimination can still work normally.
The sixth is the configuration problem (Configuration). It is necessary to verify whether each open configuration item can work normally and whether the change can take effect.
Finally, there is the compatibility issue (Compatibility), which is mainly reflected in the hardware and software configuration.
As can be seen from the figure, functions and deployment are placed at the highest level, performance, stability, and reliability are placed at the second level, and configuration and compatibility are placed at the third level. However, this level of importance is also relative.
How the development team and the quality assurance team work together
General users will think that the task of quality assurance is only assigned to the quality assurance team, but in the software development process, the quality needs multi-team cooperation to be guaranteed.
In the initial stage, the design document is developed, and the quality assurance team writes the test plan according to the design document. Both documents need to be tested and developed together to reduce misunderstanding. The goals of this version will be formulated before the release, including performance, stability, and the extent to which the number of bugs needs to converge. During the development process, the development focuses on the implementation of the coding function, and the quality assurance team will conduct testing and verification after the function is completed. These two phases will rotate many times, and the QA team and the development team need to keep the information in sync every day. In addition, in addition to the development and verification of its own functions, open source products will receive many questions from the community and will be resolved according to priority.
In the final stage, if the product meets the release criteria, the team will select a time point to release a new image. Before the release, you need to prepare a release tag and release note, pay attention to what functions this version has implemented and what issues have been fixed, and the quality assurance team will also issue a test report for this version later.
Issue management process
The quality assurance team pays more attention to issues in product development. In addition to the members of the quality assurance team, the author of the issue has a large number of external users, so it is necessary to standardize the filling information of each issue. Each issue has a template that asks the author to provide some information, such as the version currently used, machine configuration information, and then what are your expectations? What is the actual return result? How to reproduce this issue, and then the quality team and development team will continue to follow up.
After the issue is created, it will first be assigned to the person in charge of the quality assurance team, and then the person in charge will perform some status transfers on the issue. If the issue is established and there is enough information, there will be several states in the follow-up, such as: whether it is solved; whether it can be reproduced; whether it is repeated with the previous one; the probability of occurrence; the size of the priority. If it is confirmed that there is a defect, the development team will submit a PR, link the issue, and modify it. After verification, the issue will be closed, and if it is found that there are still problems, it can be reopened. In addition, in order to improve the efficiency of issue management, tags and robots will be introduced to classify issues and transfer status.
release standard
Whether it can be released mainly refers to whether the current version can meet the expected requirements. For example, the above picture is a rough situation, RC6 to RC7, RC8 and GA standards. As the version progresses, higher requirements are placed on the quality of Milvus:
- From the original order of magnitude of 50M, it has gradually evolved to the order of magnitude of 1B
- In stable task operation, the single task becomes a mixed task, and the duration gradually changes from hours to days.
- For the code, it is also gradually improving its code coverage
- ...
- In addition, as the version changes, other test items will be added. For example, in RC7, it was proposed to have a compatibility item, and it must be compatible when upgrading; in GA, more tests on chaos engineering were introduced
Introduction to Test Modules
The second part is some specific details about each test module.
General introduction
There is a joke in the industry that writing code is writing bugs. As you can see from the figure below, 85% of bugs are introduced in the coding phase.
From the perspective of testing, in the process of code writing to version release, bugs can be found through Unit Test / Functional Test / System Test in turn; but as the stage goes on, the cost of fixing bugs will also increase, so it is often inclined to Early detection and early repair. However, each stage of testing has its own focus, and it is impossible to find all bugs through only one testing method.
From writing code to merging code to the main branch, the development will ensure code quality from UT, code coverage and code review, which are also reflected in CI. In the process of submitting PR to code merge, it needs to pass static code inspection, unit test, code coverage standard and reviewer's code review.
When merging code, it also needs to pass integration tests. In order to ensure that the time of the entire CI is not too long, in this integration test, mainly run L0 and L1 cases with high priority labels. After passing all the checks, you can publish the image built by this PR in the milvusdb/milvus-dev
repository. After the image is released, scheduled tasks will be set to perform the various tests mentioned above on the latest image: full original function test, new feature function test, deployment test, performance test, stability test, chaos test, etc.
unit test
Unit testing can find bugs in software at the earliest possible stage, and can also provide verification standards for code refactoring. In Milvus' PR admission criteria, the code's unit test coverage target of 80% is set.
https://app.codecov.io/gh/milvus-io/milvus/
function test
The functional test of Milvus is mainly through the pymilvus SDK as the entry point.
Functional testing is primarily concerned with whether the interface works as expected.
- When entering normal parameters or using normal operations, whether the SDK can return the expected results
- When the parameters or operations are abnormal, can the SDK handle these errors and return some reasonable error messages
The figure below shows the current functional testing framework, which is based on the current mainstream testing framework pytest as a whole, and encapsulates pymilvus to provide automated interface testing capabilities.
The above test framework is used instead of directly using the native interface of pymilvus, because some public methods need to be extracted and some commonly used functions need to be extracted during the test process. At the same time, it will also encapsulate a check module, which can more easily verify some expected and actual values.
Currently, there are 2700+ functional test cases in the tests/python_client/testcases
directory, which basically cover all the interfaces of pymilvus, including positive and negative use cases. As the basic functional guarantee of Milvus, functional testing strictly controls the quality of each PR submitted through automation and continuous integration.
Deployment test
In the deployment test, the supported Milvus deployment forms include standalone and cluster, and the deployment methods include docker or helm. After the deployment is complete, you need to restart and upgrade the system.
The restart test is mainly to verify the persistence of data, that is, whether the data before the restart can continue to be used after the restart; the upgrade test is mainly to verify the compatibility of the data and prevent the introduction of incompatible data formats without knowing it.
The restart test and upgrade test can be unified into the following test process:
If it is a restart test, the same image is used for the two deployments; if it is an upgrade test, the old version image is used for the first deployment, and the new version image is used for the second deployment. During the second deployment, whether it is a restart or an upgrade, the test data (Volumes folder or PVC) after the first deployment will be retained. In this step Run first test
, multiple collections are created, and different operations are performed on each collection to put it in a different state, for example:
- create collection
- create collection --> insert data
- create collection --> insert data -->load
- create collection --> insert data -->flush
- create collection --> insert data -->flush -->load
- create collection --> insert data -->flush --> create index
- create collection --> insert data -->flush --> create index --> load
- ......
In Run second test
there are two verifications in this step:
- Various functions of the previously created collection are still available
- New collections can be created, and the same functions are still available
reliability test
At present, for the reliability of cloud-native and distributed products, most companies will use the method of chaos engineering to test. Chaos engineering aims to stifle failures in their infancy, that is, to identify failures before they cause disruption. By proactively creating failures, testing system behavior under various stressors, identifying and fixing failure problems before they cause serious consequences.
When executing the chaos test, Chaos Mesh was selected as the fault injection tool. Chaos Mesh was hatched by PingCAP in the process of testing the reliability of TiDB, and it is very suitable for reliability testing of cloud-native distributed databases.
Among the fault types, the following fault types are implemented:
- The first is pod kill, the test scope is all components, simulating the situation of node downtime
- Secondly, the pod failure is mainly concerned with the case of multiple copies of the work node, if one pod cannot work, the entire system can still operate normally.
- The third is memory stress, focusing on memory and CPU pressure, mainly injected into the nodes of the work node
- The last network partition is a communication isolation between pods. Milvus is a multi-layer architecture with separation of storage and computation, separation of worker nodes and coordination nodes. There is a lot of communication between different components. It is necessary to test the interdependence between them through network partition.
By constructing a framework, Chaos Test can be implemented more automatically.
process:
- Initialize a Milvus cluster by reading the deployment configuration
- After the cluster state is ready, an e2e test will be run first to verify that Milvus functions are available
- Running hello_milvus.py is mainly used to verify the persistence of data. It will create a hello_milvus collection before fault injection, and perform data insertion, flush, create index, load, search, and query. Note that collection release and drop will not be
- Create a monitoring object, which mainly starts 6 threads, and continuously executes create, insert, flush, index, search, query operations respectively.
checkers = {
Op.create: CreateChecker(),
Op.insert: InsertFlushChecker(),
Op.flush: InsertFlushChecker(flush=True),
Op.index: IndexChecker(),
Op.search: SearchChecker(),
Op.query: QueryChecker()
}
- First assertion before fault injection: all operations expected to succeed
- Injecting faults: Parse the yaml file that defines the fault, and inject faults into the Milvus system through Chaos Mesh, for example, the query node is killed every 5s
- Make a second assertion during fault injection: determine whether the results returned by each operation performed on Milvus during the fault are consistent with expectations
- Delete faults: Delete previously injected faults via Chaos Mesh
- Fault deletion, after Milvus restores service (all pods are ready), a third assertion is made: all operations are expected to succeed
- Run an e2e test to verify that Milvus functions are available. Because of the third assertion, some operations will be blocked during chaos injection. Even after the fault is eliminated, they will still be blocked, resulting in the third assertion not being all successful as expected. Therefore, this step is added to assist the judgment of the third assertion, and temporarily use this e2e test as the standard for Milus recovery
- Run hello_milvus.py, load the collection created before, and perform a series of operations on the collection to determine whether the data before the failure is still available after the failure is recovered
- log collection
Stability and performance testing
Stability test
The purpose of stability testing:
- Milvus can run smoothly for a set period of time under normal level of pressure load
- During operation, the resources used by the system remain stable, and Milvus services are normal
Two load scenarios are mainly considered:
- Read-intensive: 90% for search requests, 5% for insert requests, and 5% for others. This scenario is mainly an offline scenario. After the data is imported, it is basically not updated, and it mainly provides query services.
- Write intensive: 50% for insert requests, 40% for search requests, and 10% for others. This scenario is mainly an online scenario, which needs to provide the service of inserting and querying
Check items:
- Memory usage smoothing
- CPU usage smoothing
- IO delay smoothing
- Milvus pod status is normal
- Milvus service response time smooth
Performance Testing
Purpose of performance test:
- Perform a performance analysis of each interface of Milvus
- Find the best parameter configuration of the interface through performance comparison
- As a performance benchmark to prevent performance degradation in future releases
- Find performance bottlenecks and provide references for performance tuning
Mainly considered performance scenarios:
- Data Insertion Performance Performance Metrics: Throughput Variables: Insertion Vectors Per Batch, ...
- Index Build Performance Performance Metrics: Index Build Time Variables: Index Type, Number of Index Nodes, …
- Vector query performance performance indicators: response time, query vectors per second, requests per second, recall variables: nq, topK, dataset size, dataset type, index type, number of query nodes, deployment mode,... ...
- ......
Test Framework and Process
- Parse and update configuration, define metrics
server-configmap corresponds to the configuration of Milvus single machine or cluster
client-configmap corresponds to the test case configuration - Configure server and client
- data preparation
- Request interaction between client and server
- Reporting and display of indicator data
Efficiency methods and tools
As can be seen from the foregoing, many steps in the test process are the same, mainly modifying the configuration of the Milvus server side, the configuration of the client side, and the incoming parameters of the interface. Under multiple configurations, through permutation and combination, it is necessary to perform many experiments to cover various test scenarios more comprehensively. Therefore, code reuse, process reuse, and test efficiency are very important issues.
- Encapsulate the original method with an api_request decorator, set it to be similar to an API gateway, receive all API requests in a unified manner, send them to Milvus, and then uniformly receive the response, and then return it to the client. This makes it easier to capture some log information, such as passed parameters and returned results. At the same time, the returned results can be verified by the checker module, so that all the check methods can be defined in the same checker module.
- Set default parameters, encapsulate multiple necessary initialization steps into a function, and functions that originally required a lot of code to implement can be implemented through an interface. This setting can reduce a lot of redundant and repetitive code, making each test case simpler and clearer
- Each test case is associated with a unique collection for testing, which ensures data isolation between test cases. At the beginning of the execution of each test case, a new collection is created for testing, and the corresponding collection is deleted after the test is over
- Because each test case is independent of each other, when executing the test case, the pytest plug-in
pytest -xdist
can be executed concurrently to improve the execution efficiency
Github Action
Advantages of GitHub Actions:
- Deep integration with GitHub, native CI/CD tool
- Unified configuration of the machine environment, and pre-installed with a wealth of common software development tools
- Supports multiple operating systems and versions: Ubuntu, Mac and Windows-server
- Has a rich plugin marketplace that provides a variety of out-of-the-box functionality
- Arrange and combine through matrxi, reuse the same set of test processes, support concurrent jobs, and improve efficiency
Both deployment testing and reliability testing require separate and isolated environments, which are ideal for small-scale data testing on GitHub Action. Through daily scheduled operation, test the latest master image, and play the function of daily inspection.
performance testing tool
- Argo workflow: By creating workflows, tasks can be scheduled, and various processes can be connected in series. As can be seen from the figure on the right, multiple tasks can be run at the same time through Argo
- Kubernetes Dashboard: Visualize server-configmap and client-configmap
- NAS: Mount commonly used ann-benchmark datasets
- InfluxDB and MongoDB: Saving performance metrics results
- Grafana: server-side resource indicator monitoring, client-side performance indicator monitoring
- Redash: Performance chart display
For the full video explanation, please click:
https://www.bilibili.com/video/BV1nF411h7nJ?spm_id_from=333.999.0.0
If you have any improvements or suggestions for Milvus in the process of using, welcome to keep in touch with us on GitHub or various official channels~
With a vision to redefine data science, Zilliz is committed to building a global leader in open source technology innovation and unlocking the hidden value of unstructured data for enterprises through open source and cloud-native solutions.
Zilliz built the Milvus vector database to accelerate the development of a next-generation data platform. The Milvus database is a graduate project of the LF AI & Data Foundation. It can manage a large number of unstructured data sets and has a wide range of applications in new drug discovery, recommendation systems, chatbots, etc.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。