background
At present, in the product circle selection and voting scenario, each tag id will be bound to a product set with a certain amount of data according to the rules/indicators. When the circle selection rule conditions change or the scheduled task is triggered, the product set will be refreshed, and new products that meet the rules will be added. Items, delete items that do not meet the rules.
However, since most of the SPUs under the product set are in the hundreds of thousands, and many can reach millions, if you directly put all the 100,000 or even one million SPUs before and after the refresh into the memory to do a diff with each other, and then get the difference from the diff. When the number of tags refreshed at the same time is too many, the memory can easily overflow, causing the entire service to be down.
At the same time, the current underlying database for storing product sets is Hbase, so the strategy of all-add-all-delete is currently adopted for the refresh scenarios of product sets on the label side, that is, the refreshed product sets are saved in full first (using the idempotent saved by Hbase). If the data of the same rowkey is repeatedly saved, it will be overwritten, without making a judgment on whether additional data exists before saving), and update the modity_time=now() of the data, and then scan through the product set in batches from Hbase, Find the modity_time<now and delete it to complete a label refresh task.
Often, the amount of SPUs that actually change in a product set before and after the refresh is not large, and it is known that the change will not exceed 10% of the number of product sets through digit analysis. However, the strategy of full addition and full deletion that we are currently using, a large amount of existing data will be repeatedly inserted after each refresh, which not only prolongs the refresh speed, but also increases the pressure on the underlying storage. At the same time, because the voting platform has tags The change of the label needs to push the changed spu to the voting platform for re-branding. At the same time, the data of the label is also stored in the spu es for background display, so the current strategy of full addition and full deletion, especially a large amount of existing data The repeated insertions will be synchronized to the voting platform and the SPU ES side, causing a lot of performance waste and processing costs for them, and the transformation is imminent.
Optimization
As mentioned above, because the traditional java Set structure occupies a lot of memory in the case of a large amount of data, it is impossible to store all the data of the massive commodity sets before and after in the memory for calculation.
So is there a data structure that can maintain a low memory footprint when storing massive data, support deduplication, and support various operations such as intersection and difference?
Bitmap meets the requirements perfectly.
Bitmap is a data structure that stores data through a bit array. It is a series of consecutive binary arrays (0 and 1), and elements can be located by offset (offset). Bitmap uses the smallest unit bit to set 0|1, indicating the value or state of an element, and the time complexity is O(1).
At the same time, due to the use of Bit as the unit to store data, it can greatly save storage space. For example, it only needs 5000000/8/1024/1024=0.5M memory to store 500W data.
Therefore, we plan to use the Bitmap structure to store the product sets before and after the refresh, and then calculate the difference between the old and new Bitmaps respectively, and finally perform the add and delete operations on the difference.
Solution Feasibility Analysis
Take the label scene as an example.
The label can be bound to the voting platform, and the labeling system will mark all the product sets circled by the voting platform. At this moment, the product set under the label is marked as oldSset(X+Y).
After the voting platform is refreshed, a batch of commodity sets that meet the indicators of the voting platform will be re-selected. At this time, the commodity set under the voting platform will be recorded as newSet(Y+Z).
At this time, the labeling system needs to mark the newSet (Y+Z), and at the same time delete the products (X) that are not within the scope of this circle selection from the oldSet (X+Y).
The underlying storage of the tag product set is Hbase. For the insertion of existing data, as long as the rowkey (tag id+spuId) remains unchanged, Hbase will overwrite it and save the data with the latest timestamp. It can be understood that the old Y has been overwritten by the new Y ( Old Y data = new Y data, but the timestamp will be different), so the old all-add-all-delete scheme, the deletion magnitude is X, not X+Y.
As shown in the figure above, after each refresh, you only need to delete X and add Z.
Compared with the old all-add-all-delete logic, the magnitude of query and deletion of the new Bitmap diff solution remains unchanged, and the magnitude of the new incremental level and the notification level of the voting platform and spu es are reduced by Y.
At the same time, due to the way Bitmap itself stores data, the memory usage of the 500W spu data set is only 0.5M, so there is no need to worry about the risk of memory overflow.
Therefore, it is a theoretically feasible solution to use Bitmap to store the full amount of commodity data before and after refresh, and to do diff in memory.
Technical selection
Now that we have chosen to use Bitmap as the storage for the new scheme, which Bitmap implementation should we choose?
As we all know, there are many implementations of Bitmap, such as Java's native BitSet, guava's EWAHCompressedBitmap, third-party RoaingBitmap, redis Bitmap, etc. Since redis' Bitmap is mainly used for remote storage, it is not suitable for current memory diff scenarios, so it is excluded.
This time, we mainly compare the memory usage and CPU usage of BitSet, EWAHCompressedBitmap, and RoaingBitmap under various data sparsity, so as to select the implementation that best meets the current scenario.
memory usage test
By adding 1, N+1, 2N+1...
As can be seen from the figure below, except when the sparsity is 1, the memory usage of EWAHCompressedBitmap is the lowest, and the memory usage under other sparsity levels is: RoaingBitmap<EWAHCompressedBitmap<BitSet.
CPU time-consuming test
Add 1, N+1, 2N+1... The maximum time-consuming in 2000 times, and the time-consuming situation of each Bitmap under each sparsity is obtained.
As can be seen from the following figure, the CPU time consumption under each sparsity degree:
RoaingBitmap≈EWAHCompressedBitmap<BitSet.
Final conclusion of selection
Considering the memory usage, CPU time-consuming test, and data sparsity in actual scenarios, RoaingBitmap has the best effect. Therefore, RoaingBitmap is selected as the Bitmap implementation of the new scheme.
Introduction and principle of RoaingBitmap
RoaingBitmap storage principle
RoaingBitmap will divide the 32 bit unsigned int type data into 2^16 large Containers (that is, use the first 16 bits of the data as the Container number), and each large Container has a small Container to store the lower 16 bits of a value.
When storing and querying values, divide the value k into high 16 bits and low 16 bits, take the high 16 bits to find the corresponding Container, and then store the low 16 bits in the corresponding Container.
This may be more abstract and difficult to understand. I will illustrate it with an example below.
For example, we want to put the number 31 into the RoarigBitmap. Its hexadecimal number is: 0000 001F, the first 16 digits are 0000, and the last 16 digits are 001F. Therefore, we first need to find the corresponding Container number as 0 according to the value of the first 16 bits: 0, and then determine where the value should be placed in the Container according to the value of the last 16 bits: 31, as shown in the following figure .
It should be noted that the small Containers in the large Container are only applied for development when needed, not all of them are applied for at the beginning, and the small and medium Containers in the large Container are arranged in the large Container in order by serial number.
Four kinds of container introduction
In order to maintain good memory usage and performance in various scenarios and sparsity, RoaingBitmap specially designed 4 kinds of small Containers, namely ArrayContainer (array container), BitmapContainer (bitmap container), Runcontainer (stroke step size) Container), Sharedcontainer (shared container), I will introduce the usage scenarios and principles of each ArrayContainer below.
Arraycontiner
When creating a new container, if only one element is inserted, RBM (RoaingBitmap) will use ArrayContainer to store by default. When the capacity of ArrayContainer (in which the type of each element is short occupies two bytes, and the elements in it are arranged in order from small to large) exceeds 4096 (here refers to 4096 shorts or 8k), it will automatically Converted to BitmapContainer (this occupied space is always 8k) storage. The threshold of 4096 is very smart. ArrayContainer saves space when it is lower than it, and BitmapContainer saves space when it is higher. That is to say, ArrayContainer stores sparse data, and BitmapContainer stores dense data, which can minimize memory waste, as shown in the following figure.
BitmapContainer
This container is actually what we call a bitmap, but the number of bits in the bitmap here is 65536, which is 2^16 bits, and the memory occupied by the calculation is 8kb. Then each bit uses 0, 1 to indicate that the number does not exist or exists, as shown in the following figure:
Runcontainer
This is a way to compress the space using the step size
Let's take an example: For example, the continuous integer sequence 11, 12, 13, 14, 15, 27, 28, 29 will be compressed into two binary groups 11, 4, 27, 2 means: 11 is followed by 4 Continuously increasing value, 27 is followed by 2 consecutively increasing values, then the original 16-byte space now only needs 8 bytes, does it save a lot of space? However, this kind of container is not commonly used, so when using it, we need to call the relevant conversion function to determine whether we need to convert arraycontiner or BitmapContainer to Runcontainer.
Sharedcontainer
This kind of container itself does not store data, but only uses it to point to ArrayContainer, BitmapContainer or Runcontainer, just like the role of a pointer. This pointer can be owned by multiple objects, but the substance pointed to by the pointer is used by these multiple objects. shared by the objects. When we copy between RoaingBitmaps, sometimes it is not necessary to copy multiple copies of a container, then we can use Sharedcontainer to point to the actual container, and then assign Sharedcontainer to multiple RoaingBitmap objects to hold, this RoaingBitmap object is The container that actually stores the data can be found according to the Sharedcontainer, which can save unnecessary space waste.
The relationship between these containers can be represented by the following picture:
The roaring_array is a RoaingBitmap object, and the Sharedcontainer on the way is shared by the small Containers in multiple roaring_arrays.
RoaingBitmap Advantages
Memory
Bitmap is more suitable for storage scenarios with dense data distribution. For the original Bitmap, to store a uint32 type data, it requires a bit array of 2 ^ 32 length, which can be found by calculation (2 ^ 32 / 8 bytes = 512MB), an ordinary Bitmap needs to consume 512MB of storage space. If we only store a few data, it still needs to occupy 512M, which is a bit of a waste of space, so we can use RoaingBitmap that compresses the bitmap to reduce memory and improve efficiency.
performance
In addition to taking up less memory than Bitmap, RoaingBitmap also performs faster union and intersection operations than Bitmap. The reasons boil down to the following:
computational optimization
For RoaingBitmap, it essentially divides a large block of Bitmap into small blocks, and each small block exists only when data needs to be stored. So when performing intersection or union operations, RoaingBitmap only needs to calculate some blocks that exist and does not need to calculate the entire large block like Bitmap. If the block is very sparse, then it is only necessary to perform the AND and OR operations of the sets on these small integer lists, so that the calculation amount can continue to be reduced. Here, neither space is exchanged for time nor time is exchanged for space, but the complexity of logic is exchanged for space and time at the same time.
At the same time, the 32-bit long data in RoaingBitmap is divided into high 16 bits and low 16 bits. The high 16 bits represent the block offset, and the low 16 bits represent the position within the block. A single block can express a bit length of 64k, which is 8K words. Festival. This ensures that a single block can all be put into the L1 Cache, which can significantly improve performance.
program logic optimization
RoaingBitmap maintains sorted first-level indexes and ordered ArrayContainers. When performing an intersection operation, you only need to obtain the containers that need to be merged according to the corresponding values in the first-level index. Containers that do not need to be merged do not need to be merged. Its operation is directly filtered out.
When the difference in the number of data in the merged ArrayContainers is too large, a method based on binary search is used to obtain the intersection of the ArrayContainers to avoid unnecessary time overhead of linear merging.
When RoaingBitmap is doing union, only the containers with the same index are merged according to the first-level index, and those with different indexes can be directly added to the new RoaingBitmap without traversing the container.
RoaingBitmap will first predict the results when merging containers, generate corresponding containers, and avoid unnecessary container conversion operations
show me the code
The code logic is actually relatively simple, mainly to build new and old Bitmaps, to add new spus after calculating the difference between each other, and to delete the spus that need to be deleted this time.
Optimization effect
refresh rate
Counting the time-consuming of all tags under the old and new logic, it is found that most of the improvement ratios are concentrated in the range of 40%-60%, and the highest and lowest values are removed.
The final conclusion is that the average increase rate is 52.42%
Write magnitude and impact on other scenarios
Counting the write level of all tags under the old and new logic, it is found that most of the improvement ratios are concentrated in the range of 85%-99%, and the highest and lowest values are removed.
The final conclusion is that the average increase rate is 86.98%
Summarize
Since Java's Set structure occupies a large amount of memory under a large amount of data, it is not possible to directly use set in memory to fully store the product set before and after the refresh and perform the difference set operation in the refresh scenario of the circled product set.
Therefore, considering the use of Bitmap, a data structure that stores data through a bit array, it uses Bit as a unit to store data, so it can greatly save storage space.
Compared with various Bitmap implementations in the industry, combined with the current scene, RoaingBitmap is finally used as the final implementation.
RoaingBitmap is an evolution of bitmap, that is, compressed bitmap. However, RoaingBitmap not only includes Bitmap, a data structure, but also includes a variety of storage methods (contianer). At the same time, through calculation and logical optimization, it is guaranteed Compared with traditional Bitmap, it can maintain lower memory usage and comparison speed under various sparsity.
The optimization effect after the final launch is also quite good. The refresh speed is increased by about 52%, and the write level is reduced by an average of 87%, which effectively improves the refresh speed and the pressure on the storage DB and other scene domains.
At the same time, this solution is also applicable to other similar scenarios, such as the refresh on the voting platform side, and the product refresh under the theme set bound to the voting platform.
Reference documentation:
1. RoaingBitmap official github address
2. Use Apache ECharts to complete optimization renderings
*Text/Creed
@德物科技public account
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。