头图

Author: Cai Yudong

Editor's note:
Milvus 2.0 brings a lot of new features. Among these new capabilities, time travel, attribute filtering, and deletion operations are interrelated, as these three capabilities are implemented through a common mechanism, Bitset. Milvus R&D engineer Cai Yudong will analyze the concept of Bitset in Milvus, and explain how it supports delete operations, time travel and attribute filtering through three examples.

What is Bitset?

Since Milvus version 0.7.0, in order to support the Delete function, we have introduced Bitset to mark whether the entity of each row in the segment has been deleted (if the bit corresponding to Bitset is 1, it indicates that the row entity has been deleted, the The row entity does not participate in the calculation at query time).
In Milvus 2.0, we further extended the use of Bitset. The most basic semantics of Bitset remain unchanged - if the corresponding bit is 1, it indicates that the row entity does not participate in the query operation. But the use of Bitset is no longer limited to the Delete operation, there are 3 operations that affect Bitset, they are:

  1. attribute filtering
  2. data deletion
  3. time travel

How is Bitset calculated in Milvus?

Next, we use three examples to illustrate how Bitset is calculated.
Assuming that there is one segment, we perform three DML (data manipulation language) operations on this segment in turn:
image.png
Order of DML events

  • When ts = 100, the insert operation is performed, and 4 entities are inserted, and their primary_keys are [1, 2, 3, 4];
  • When ts = 200, perform the second insert operation, insert another 4 entities, their primary_keys are [5, 6, 7, 8];
  • When ts = 300, the Delete operation is performed, and 2 entities are deleted, and their primary_keys are [7, 8] respectively.
  • At the same time, it is assumed that when querying, the result obtained by attribute filtering is primary_key = [1, 3, 5, 7] .

Case number one

image.png
Search with time_traval = 150
Suppose the user executes the first query, specifying time_travel = 150. The calculation process of Bitset is shown in the figure above.
The initial state of filter_bitset is [1, 0, 1, 0, 1, 0, 1, 0] (when the bit is 1, it means it is valid), because the time_travel = 150 specified during the query, pk = [5, 6, The entity of 7, 8] has not been inserted, so the corresponding bit of the entity of pk = [5, 6, 7, 8] in the Bitset must be cleared. Calculated to get filter_bitset_2 = [1, 0, 1, 0, 0, 0, 0, 0]. And because the 1 in the final result bitset should indicate that the row entity is invalid, it is necessary to invert all bits in filter_bitset_2 to get filter_bitset_3 = [0, 1, 0, 1, 1, 1, 1, 1].
The initial state of del_bitset is [0, 0, 0, 0, 0, 0, 1, 1]. Considering the time_travel = 150 specified in the query, the delete operation has not been executed at this moment, so all bits of Bitset should be cleared, indicating that All data are valid. This gives del_bitset_2 = [0, 0, 0, 0, 0, 0, 0, 0].
OR the filter_bitset_3 and del_bitset_2 to get the final result_bitset = [0, 1, 0, 1, 1, 1, 1, 1].
Finally, we will use result_bitset to participate in the query operation.

Case 2

image.png

Search with time_traval = 250
Suppose the user executes a second query, specifying time_travel = 250. The calculation process of Bitset is shown in the figure above.
As in the first case, the initial state of filter_bitset should be [1, 0, 1, 0, 1, 0, 1, 0].
When ts = 250, all entities [1, 2, 3, 4, 5, 6, 7, 8] have been inserted into Milvus. Therefore, the results of filter_bitset_2 and previous filter_bitset remain unchanged. Similarly, we need to invert all bits of filter_bitset_2 to get [0, 1, 0, 1, 0, 1, 0, 1].
The initial state of del_bitset is [0, 0, 0, 0, 0, 1, 1]. Also consider the time_travel = 250 specified in the query, the delete operation has not been executed at this time, so all bits of Bitset should be cleared, indicating that all data are valid. So del_bitset_2 after time travel = [0, 0, 0, 0, 0, 0, 0].
OR the filter_bitset_3 and del_bitset_2 to get the final result_bitset = [0, 1, 0, 1, 0, 1, 0, 1].
In the end, only entities [1, 3, 5, 7] will participate in subsequent query operations.

Case 3

Suppose the user executes the third query, specifying time_travel = 350. So, what is the calculation process of Bitset? Consider the two cases above.
image.png

notice

Not long ago, we also covered the data deletion design in Milvus 2.0.
In the next article, we will continue to introduce the logic behind data compression, dynamic load balancing and other functions in Milvus 2.0. Welcome to continue to pay attention!

Milvus project address: https://github.com/milvus-io/milvus
Milvus homepage and document address: https://milvus.io/
Milvus Slack Channel:milvusio.slack.com


Zilliz
154 声望829 粉丝

Vector database for Enterprise-grade AI