Detailed explanation of Milvus 2.0 quality assurance system

Editor's note: This article details the workflow, implementation details of the Milvus 2.0 quality assurance system, and optimizations to improve efficiency.

Content outline:

Quality Assurance General Introduction to Test Content Concerns How the Development Team and Quality Assurance Team Work Together
Issue management process release standard
Introduction to testing modulesGeneral introductionUnit testingFunctional testingDeployment testingReliability testingStability and performance testing
Efficiency methods and tools
Github Action
General introduction to quality assurance of performance testing tools

General introduction to quality assurance

The architectural design diagram is also important for quality assurance. Only by fully understanding the object under test can a more reasonable and efficient test plan be formulated. Milvus 2.0 is a cloud-native and distributed architecture. The main entrance is through the SDK, and there are many layered logics inside. Therefore, for users, the SDK side is a very worthy part of attention. When testing Milvus, the SDK side will be functionally tested first, and the possible problems within Milvus will be found through the SDK. At the same time Milvus is also a database, so various system tests on the database will also be involved.

A cloud-native, distributed architecture brings benefits to testing but also introduces challenges. The advantage is that, different from the traditional local deployment and operation, deploying and running in the k8s cluster can ensure that the software environment is as consistent as possible during development and testing. The challenge is that the distributed system becomes complex, introduces more uncertainty, and increases the test workload and difficulty. For example, after the microservice is implemented, the number of services will increase, the number of nodes of the machine will increase, and the more intermediate stages, the greater the possibility of errors, and each situation needs to be considered when testing.

Test content focus

According to the nature of the product and the needs of users, the content and priority of Milvus tests are shown in the figure below.

First of all, on the function (Function), pay attention to whether the interface is in line with the design expectations.

Secondly, on the deployment (Deployment), pay attention to whether the restart and upgrade can be successful in standalone or cluster mode.

Third, in terms of performance, because it is a real-time analysis database that integrates streams and batches, it will pay more attention to speed, and will pay more attention to the performance of inserting, indexing, and querying.

Fourth, on Stability, pay attention to the uptime of Milvus under normal load, and the expected target is 5-10 days.

The fifth is reliability (Reliability), pay attention to the performance of Milvus when the error occurs, and whether the error elimination can still work normally.

The sixth is the configuration problem (Configuration). It is necessary to verify whether each open configuration item can work normally and whether the change can take effect.

Finally, there is the compatibility issue (Compatibility), which is mainly reflected in the hardware and software configuration.

As can be seen from the figure, functions and deployment are placed at the highest level, performance, stability, and reliability are placed at the second level, and configuration and compatibility are placed at the third level. However, this level of importance is also relative.

How the development team and the quality assurance team work together

General users will think that the task of quality assurance is only assigned to the quality assurance team, but in the software development process, the quality needs multi-team cooperation to be guaranteed.

In the initial stage, the design document is developed, and the quality assurance team writes the test plan according to the design document. Both documents need to be tested and developed together to reduce misunderstanding. The goals of this version will be formulated before the release, including performance, stability, and the extent to which the number of bugs needs to converge. During the development process, the development focuses on the implementation of the coding function, and the quality assurance team will conduct testing and verification after the function is completed. These two phases will rotate many times, and the QA team and the development team need to keep the information in sync every day. In addition, in addition to the development and verification of its own functions, open source products will receive many questions from the community and will be resolved according to priority.

In the final stage, if the product meets the release criteria, the team will select a time point to release a new image. Before the release, you need to prepare a release tag and release note, pay attention to what functions this version has implemented and what issues have been fixed, and the quality assurance team will also issue a test report for this version later.

Issue management process

The quality assurance team pays more attention to issues in product development. In addition to the members of the quality assurance team, the author of the issue has a large number of external users, so it is necessary to standardize the filling information of each issue. Each issue has a template that asks the author to provide some information, such as the version currently used, machine configuration information, and then what are your expectations? What is the actual return result? How to reproduce this issue, and then the quality team and development team will continue to follow up.

After the issue is created, it will first be assigned to the person in charge of the quality assurance team, and then the person in charge will perform some status transfers on the issue. If the issue is established and there is enough information, there will be several states in the follow-up, such as: whether it is solved; whether it can be reproduced; whether it is repeated with the previous one; the probability of occurrence; the size of the priority. If it is confirmed that there is a defect, the development team will submit a PR, link the issue, and modify it. After verification, the issue will be closed, and if it is found that there are still problems, it can be reopened. In addition, in order to improve the efficiency of issue management, tags and robots will be introduced to classify issues and transfer status.

release standard

Whether it can be released mainly refers to whether the current version can meet the expected requirements. For example, the above picture is a rough situation, RC6 to RC7, RC8 and GA standards. As the version progresses, higher requirements are placed on the quality of Milvus:

From the original order of magnitude of 50M, it has gradually evolved to the order of magnitude of 1B
In stable task operation, the single task becomes a mixed task, and the duration gradually changes from hours to days.
For the code, it is also gradually improving its code coverage
...
In addition, as the version changes, other test items will be added. For example, in RC7, it was proposed to have a compatibility item, and it must be compatible when upgrading; in GA, more tests on chaos engineering were introduced

Introduction to Test Modules

The second part is some specific details about each test module.

General introduction

There is a joke in the industry that writing code is writing bugs. As you can see from the figure below, 85% of bugs are introduced in the coding phase.

From the perspective of testing, in the process of code writing to version release, bugs can be found through Unit Test / Functional Test / System Test in turn; but as the stage goes on, the cost of fixing bugs will also increase, so it is often inclined to Early detection and early repair. However, each stage of testing has its own focus, and it is impossible to find all bugs through only one testing method.

From writing code to merging code to the main branch, the development will ensure code quality from UT, code coverage and code review, which are also reflected in CI. In the process of submitting PR to code merge, it needs to pass static code inspection, unit test, code coverage standard and reviewer's code review.

When merging code, it also needs to pass integration tests. In order to ensure that the time of the entire CI is not too long, in this integration test, mainly run L0 and L1 cases with high priority labels. After passing all the checks, you can publish the image built by this PR in the milvusdb/milvus-dev repository. After the image is released, scheduled tasks will be set to perform the various tests mentioned above on the latest image: full original function test, new feature function test, deployment test, performance test, stability test, chaos test, etc.

unit test

Unit testing can find bugs in software at the earliest possible stage, and can also provide verification standards for code refactoring. In Milvus' PR admission criteria, the code's unit test coverage target of 80% is set.

https://app.codecov.io/gh/milvus-io/milvus/

function test

The functional test of Milvus is mainly through the pymilvus SDK as the entry point.

Functional testing is primarily concerned with whether the interface works as expected.

When entering normal parameters or using normal operations, whether the SDK can return the expected results
When the parameters or operations are abnormal, can the SDK handle these errors and return some reasonable error messages

The figure below shows the current functional testing framework, which is based on the current mainstream testing framework pytest as a whole, and encapsulates pymilvus to provide automated interface testing capabilities.

The above test framework is used instead of directly using the native interface of pymilvus, because some public methods need to be extracted and some commonly used functions need to be extracted during the test process. At the same time, it will also encapsulate a check module, which can more easily verify some expected and actual values.

Currently, there are 2700+ functional test cases in the tests/python_client/testcases directory, which basically cover all the interfaces of pymilvus, including positive and negative use cases. As the basic functional guarantee of Milvus, functional testing strictly controls the quality of each PR submitted through automation and continuous integration.

Deployment test

In the deployment test, the supported Milvus deployment forms include standalone and cluster, and the deployment methods include docker or helm. After the deployment is complete, you need to restart and upgrade the system.

The restart test is mainly to verify the persistence of data, that is, whether the data before the restart can continue to be used after the restart; the upgrade test is mainly to verify the compatibility of the data and prevent the introduction of incompatible data formats without knowing it.

The restart test and upgrade test can be unified into the following test process:

If it is a restart test, the same image is used for the two deployments; if it is an upgrade test, the old version image is used for the first deployment, and the new version image is used for the second deployment. During the second deployment, whether it is a restart or an upgrade, the test data (Volumes folder or PVC) after the first deployment will be retained. In this step Run first test , multiple collections are created, and different operations are performed on each collection to put it in a different state, for example:

create collection
create collection --> insert data
create collection --> insert data -->load
create collection --> insert data -->flush
create collection --> insert data -->flush -->load
create collection --> insert data -->flush --> create index
create collection --> insert data -->flush --> create index --> load
......

In Run second test there are two verifications in this step:

Various functions of the previously created collection are still available
New collections can be created, and the same functions are still available

reliability test

At present, for the reliability of cloud-native and distributed products, most companies will use the method of chaos engineering to test. Chaos engineering aims to stifle failures in their infancy, that is, to identify failures before they cause disruption. By proactively creating failures, testing system behavior under various stressors, identifying and fixing failure problems before they cause serious consequences.

When executing the chaos test, Chaos Mesh was selected as the fault injection tool. Chaos Mesh was hatched by PingCAP in the process of testing the reliability of TiDB, and it is very suitable for reliability testing of cloud-native distributed databases.

Among the fault types, the following fault types are implemented:

The first is pod kill, the test scope is all components, simulating the situation of node downtime
Secondly, the pod failure is mainly concerned with the case of multiple copies of the work node, if one pod cannot work, the entire system can still operate normally.
The third is memory stress, focusing on memory and CPU pressure, mainly injected into the nodes of the work node
The last network partition is a communication isolation between pods. Milvus is a multi-layer architecture with separation of storage and computation, separation of worker nodes and coordination nodes. There is a lot of communication between different components. It is necessary to test the interdependence between them through network partition.

By constructing a framework, Chaos Test can be implemented more automatically.

process:

Initialize a Milvus cluster by reading the deployment configuration
After the cluster state is ready, an e2e test will be run first to verify that Milvus functions are available
Running hello_milvus.py is mainly used to verify the persistence of data. It will create a hello_milvus collection before fault injection, and perform data insertion, flush, create index, load, search, and query. Note that collection release and drop will not be
Create a monitoring object, which mainly starts 6 threads, and continuously executes create, insert, flush, index, search, query operations respectively.

 checkers = {
    Op.create: CreateChecker(),
    Op.insert: InsertFlushChecker(),
    Op.flush: InsertFlushChecker(flush=True),
    Op.index: IndexChecker(),
    Op.search: SearchChecker(),
    Op.query: QueryChecker()
}

First assertion before fault injection: all operations expected to succeed
Injecting faults: Parse the yaml file that defines the fault, and inject faults into the Milvus system through Chaos Mesh, for example, the query node is killed every 5s
Make a second assertion during fault injection: determine whether the results returned by each operation performed on Milvus during the fault are consistent with expectations
Delete faults: Delete previously injected faults via Chaos Mesh
Fault deletion, after Milvus restores service (all pods are ready), a third assertion is made: all operations are expected to succeed
Run an e2e test to verify that Milvus functions are available. Because of the third assertion, some operations will be blocked during chaos injection. Even after the fault is eliminated, they will still be blocked, resulting in the third assertion not being all successful as expected. Therefore, this step is added to assist the judgment of the third assertion, and temporarily use this e2e test as the standard for Milus recovery
Run hello_milvus.py, load the collection created before, and perform a series of operations on the collection to determine whether the data before the failure is still available after the failure is recovered
log collection

Stability and performance testing

Stability test

The purpose of stability testing:

Milvus can run smoothly for a set period of time under normal level of pressure load
During operation, the resources used by the system remain stable, and Milvus services are normal

Two load scenarios are mainly considered:

Read-intensive: 90% for search requests, 5% for insert requests, and 5% for others. This scenario is mainly an offline scenario. After the data is imported, it is basically not updated, and it mainly provides query services.
Write intensive: 50% for insert requests, 40% for search requests, and 10% for others. This scenario is mainly an online scenario, which needs to provide the service of inserting and querying

Check items:

Memory usage smoothing
CPU usage smoothing
IO delay smoothing
Milvus pod status is normal
Milvus service response time smooth

Performance Testing

Purpose of performance test:

Perform a performance analysis of each interface of Milvus
Find the best parameter configuration of the interface through performance comparison
As a performance benchmark to prevent performance degradation in future releases
Find performance bottlenecks and provide references for performance tuning

Mainly considered performance scenarios:

Data Insertion Performance Performance Metrics: Throughput Variables: Insertion Vectors Per Batch, ...
Index Build Performance Performance Metrics: Index Build Time Variables: Index Type, Number of Index Nodes, …
Vector query performance performance indicators: response time, query vectors per second, requests per second, recall variables: nq, topK, dataset size, dataset type, index type, number of query nodes, deployment mode,... ...
......

Test Framework and Process

Parse and update configuration, define metrics
server-configmap corresponds to the configuration of Milvus single machine or cluster
client-configmap corresponds to the test case configuration
Configure server and client
data preparation
Request interaction between client and server
Reporting and display of indicator data

Efficiency methods and tools

As can be seen from the foregoing, many steps in the test process are the same, mainly modifying the configuration of the Milvus server side, the configuration of the client side, and the incoming parameters of the interface. Under multiple configurations, through permutation and combination, it is necessary to perform many experiments to cover various test scenarios more comprehensively. Therefore, code reuse, process reuse, and test efficiency are very important issues.

Encapsulate the original method with an api_request decorator, set it to be similar to an API gateway, receive all API requests in a unified manner, send them to Milvus, and then uniformly receive the response, and then return it to the client. This makes it easier to capture some log information, such as passed parameters and returned results. At the same time, the returned results can be verified by the checker module, so that all the check methods can be defined in the same checker module.
Set default parameters, encapsulate multiple necessary initialization steps into a function, and functions that originally required a lot of code to implement can be implemented through an interface. This setting can reduce a lot of redundant and repetitive code, making each test case simpler and clearer
Each test case is associated with a unique collection for testing, which ensures data isolation between test cases. At the beginning of the execution of each test case, a new collection is created for testing, and the corresponding collection is deleted after the test is over
Because each test case is independent of each other, when executing the test case, the pytest plug-in pytest -xdist can be executed concurrently to improve the execution efficiency

Github Action

Advantages of GitHub Actions:

Deep integration with GitHub, native CI/CD tool
Unified configuration of the machine environment, and pre-installed with a wealth of common software development tools
Supports multiple operating systems and versions: Ubuntu, Mac and Windows-server
Has a rich plugin marketplace that provides a variety of out-of-the-box functionality
Arrange and combine through matrxi, reuse the same set of test processes, support concurrent jobs, and improve efficiency

Both deployment testing and reliability testing require separate and isolated environments, which are ideal for small-scale data testing on GitHub Action. Through daily scheduled operation, test the latest master image, and play the function of daily inspection.

performance testing tool

Argo workflow: By creating workflows, tasks can be scheduled, and various processes can be connected in series. As can be seen from the figure on the right, multiple tasks can be run at the same time through Argo
Kubernetes Dashboard: Visualize server-configmap and client-configmap
NAS: Mount commonly used ann-benchmark datasets
InfluxDB and MongoDB: Saving performance metrics results
Grafana: server-side resource indicator monitoring, client-side performance indicator monitoring
Redash: Performance chart display

For the full video explanation, please click:
https://www.bilibili.com/video/BV1nF411h7nJ?spm_id_from=333.999.0.0

If you have any improvements or suggestions for Milvus in the process of using, welcome to keep in touch with us on GitHub or various official channels~

With a vision to redefine data science, Zilliz is committed to building a global leader in open source technology innovation and unlocking the hidden value of unstructured data for enterprises through open source and cloud-native solutions.

Zilliz built the Milvus vector database to accelerate the development of a next-generation data platform. The Milvus database is a graduate project of the LF AI & Data Foundation. It can manage a large number of unstructured data sets and has a wide range of applications in new drug discovery, recommendation systems, chatbots, etc.

Detailed explanation of Milvus 2.0 quality assurance system

General introduction to quality assurance

Test content focus

How the development team and the quality assurance team work together

Issue management process

release standard

Introduction to Test Modules

General introduction

unit test

function test

Deployment test

reliability test

Stability and performance testing

Stability test

Performance Testing

Test Framework and Process

Efficiency methods and tools

Github Action

performance testing tool

Zilliz

引用和评论

成本最高直降50倍! Zilliz Cloud Serverless Beta上线，限时免费，早用早省钱！

一文掌握 MCP 上下文协议：从理论到实践

被 Manus 带火的 MCP 是什么｜一文看懂

AI Agent爆火后，MCP协议为什么如此重要！

AdventureX 2025 正式启动：五天四夜，120小时极限创造！一起在杭州点燃青年创新之火！

MCP 协议为何不如你想象的安全？从技术专家视角解读

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密