In the business scenario of mobile applications, we need to save such information: a key is associated with a data set.
Common scenarios are as follows:
- Give a userId to determine the user's login status;
- Display the number of check-in times and first check-in time of the user in a certain month;
- The check-in status of 200 million users in the last 7 days, and the total number of users who checked-in continuously in 7 days;
Under normal circumstances, the number of users and visits we face is huge, such as the number of users at the level of millions or tens of millions, or visit information at the level of tens of millions or even hundreds of millions.
Therefore, we must choose a collection type that can count large amounts of data (for example, billions) very efficiently.
How to choose a suitable data set, we must first understand the commonly used statistical models, and use reasonable data to solve practical problems.
Four types of statistics:
- Binary status statistics;
- Aggregate statistics;
- Sort statistics
- Base statistics.
This article will use the binary state statistics type as the beginning of the actual combat series. The article will use the extended data type Bitmap
String, Set, Zset, List, and hash .
The instructions involved in the article can be run and debugged through the online Redis client, address: https://try.redis.io/, super convenient.
Message
Share more and pay more, create more value for others in the early stage and ignore the rewards. In the long run, these efforts will reward you twice.
Especially when you are just starting to work with others, don't worry about short-term returns. It doesn't make much sense. It's more about training your vision, perspective, and problem-solving ability.
Binary status statistics
Brother Ma, what is binary status statistics?
That is to say, the values of the elements in the set are only 0 and 1. In the scenario of check-in and punch-in and whether the user is logged in, only record check-in (1) or
not check-in (0),
logged in (1) or
Not logged in (0).
Suppose we use Redis's String type implementation in the scenario of judging whether a user is logged in ( key -> userId, value -> 0 means offline, 1-login ), if we store the login status of 1 million users, if we use characters To store in the form of strings, 1 million strings need to be stored, and the memory overhead is too large.
Brother Code, why does the String type have a large memory overhead?
In addition to recording the actual data, the String type also requires additional memory to record data length, space usage and other information.
When the saved data contains a string, the String type is saved using a simple dynamic string (SDS) structure, as shown in the following figure:
- len : occupies 4 bytes, indicating the used length of buf.
- alloc : occupies 4 bytes, indicating the actual allocated length of buf, usually> len.
- buf : byte array, save the actual data, Redis automatically adds a "\0" at the end of the array, which takes up an extra byte of overhead.
Therefore, in addition to buf to store actual data in SDS, len and alloc are additional overheads.
In addition, there is an RedisObject structure , because Redis has many data types, and different data types have the same metadata to be recorded (such as the last access time, the number of references, etc.).
Therefore, Redis will use a RedisObject structure to uniformly record these metadata, while pointing to the actual data.
For the binary state scene, we can use Bitmap to achieve it. For example, we use one bit to indicate the login status, and 100 million users only occupy 100 million bits of memory ≈ (100000000/8/1024/1024) 12 MB.
大概的空间占用计算公式是:($offset/8/1024/1024) MB
What is Bitmap?
The underlying data structure of Bitmap uses the String type SDS data structure to store the bit array. Redis uses the 8 bits of each byte array, and each bit represents the binary state of an element (either 0 or 1 ).
You can think of Bitmap as an array with bits as the unit. Each unit of the array can only store 0 or 1. The subscript of the array is called the offset in the Bitmap.
For visual display, we can understand that each byte of the buf array is represented by a row, each row has 8 bits, and 8 grids represent 8 bits in this byte, as shown in the following figure:
8 bits form a Byte, so Bitmap will greatly save storage space. This is the advantage of Bitmap.
Determine user login status
How to use Bitmap to determine whether a user is online among a large number of users?
Bitmap provides the GETBIT、SETBIT
operation, which reads and writes the bit at the offset position of the bit array through an offset value. It should be noted that the offset starts from 0.
Only one key = login_status is required to store the user login status collection data. The user ID is used as the offset, and it is set to 1 for online and 0 for offline. Use GETBIT
determine whether the corresponding user is online. 50 million users only need 6 MB of space.
SETBIT command
SETBIT <key> <offset> <value>
Set or clear the bit value of the key value at offset (only 0 or 1).
GETBIT command
GETBIT <key> <offset>
Get the value of the bit of the key value at offset, and return 0 when the key does not exist.
Suppose we want to determine the login status of the user with ID = 10086:
The first step is to execute the following instructions to indicate that the user has logged in.
SETBIT login_status 10086 1
The second step is to check whether the user is logged in, and the return value of 1 means logged in.
GETBIT login_status 10086
The third step is to log out and set the value corresponding to offset to 0.
SETBIT login_status 10086 0
User's monthly check-in status
In the check-in statistics, each user's daily check-in is represented by 1 bit, and only 365 bits are needed for one year's check-in. There are only 31 days in a month, and only 31 bits are required.
For example, how should the user with statistical number 89757 check in in May 2021?
The key can be designed as uid:sign:{userId}:{yyyyMM}
, and the value of each day of the month-1 can be used as the offset (because the offset starts from 0, so offset = date-1).
The first step is to execute the following command to record the user's check-in on May 16, 2021.
SETBIT uid:sign:89757:202105 15 1
The second step is to determine whether user number 89757 has clocked in on May 16, 2021.
GETBIT uid:sign:89757:202105 15
The third step is to count the number of times the user has BITCOUNT
in in May, using the 060c85e1388a0c command. This command is used to count the number of bits with value = 1 in the given bit array.
BITCOUNT uid:sign:89757:202105
In this way, we can realize the user's check-in situation every month, isn't it great?
How to count the time of the first check-in this month?
Redis provides the BITPOS key bitValue [start] [end]
command, and the returned data represents the offset position bitValue
By default, the command will detect the entire bitmap, and the user can specify the range to be detected start
and end
So we can get userID = 89757 in May 2021 by executing the following command first punch date:
BITPOS uid:sign:89757:202105 1
It should be noted that we need to return the value + 1 because the offset starts from 0.
Total number of users who have checked in consecutively
After recording the check-in data of 100 million users for 7 consecutive days, how to count the total number of users who check-in for 7 consecutive days?
We use the date of the day as the key of the Bitmap and the userId as the offset. If it is a check-in, set the bit of the offset position to 1.
The data of each bit in the set corresponding to the key is the check-in record of a user on that date.
There are a total of 7 such Bitmaps. If we can do an AND operation on the corresponding bits of these 7 Bitmaps.
The same UserID offset is the same. When a userID is in the offset position corresponding to the 7 Bitmaps, the bit = 1 means that the user has clocked in continuously for 7 days.
The result is saved in a new Bitmap, we then count the number of bit = 1 BITCOUNT
Redis provides the BITOP operation destkey key [key ...]
for bitmap operations on one or more key = key Bitmap.
opration
can be and
, OR
, NOT
, XOR
. When BITOP processes strings of different lengths, the missing part of the shorter string will be treated as 0
. The empty key
also regarded as a string sequence 0
It is easy to understand, as shown in the figure below:
Three Bitmaps, the corresponding bit bits are "ANDed", and the result is saved in the new Bitmap.
The operation instruction means to perform AND operation on three bitmaps and save the result in destmap. Then perform BITCOUNT statistics on destmap.
// 与操作
BITOP AND destmap bitmap:01 bitmap:02 bitmap:03
// 统计 bit 位 = 1 的个数
BITCOUNT destmap
Simply calculate the memory overhead of the next 100 million bits of Bitmap, which occupies about 12 MB of memory (10^8/8/1024/1024), and the memory overhead of a 7-day Bitmap is about 84 MB. At the same time, we'd better set an expiration time for Bitmap, let Redis delete the expired check-in data, and save memory.
summary
Thinking is the most important thing. When we encounter statistical scenarios that only need the binary status of statistical data, such as whether users exist, whether ip is blacklisted, and check-in and clock-in statistics, we can consider using Bitmap.
Only one bit is needed to represent 0 and 1. It will greatly reduce the memory usage when counting massive data.
Recommended in the
Redis Core: The Secret that Only Fast and
Redis Journal: A killer for fast recovery without fear of downtime
Redis High Availability: You call this Sentinel Sentinel Cluster Principle
Redis High Availability: How much data can the Cluster support?
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。