After reading your clever use of data types to achieve billions of data statistics , I learned how to use different data types with ease ( String, Hash, List, Set, Sorted Set, HyperLogLog , Bitmap ) to solve statistical problems in different scenarios.
The product manager said that he has an idea, which provides an opportunity for boys and girls to connect with each other.
Let the most beautiful in this age boys and girls can be in every 12 hour to encounter to that in Ta
.
So I want to develop a App
, users can find the nearby Ta
and connect to each other.
How do I implement find people near ? I also hope to meet the goddess App
In the memory, one night after get off work, she moved lightly from the crowd, her tall and slender figure was like an ethereal note floating in the space. Her eyes are full of clear sunlight and vitality, and the stars of the Milky Way are imprinted in her eyes.
Opening message
Exercise your expressive skills, especially at work. Many people say that "the ones who do work are not as good as those who do PPT." In fact, the bosses are not stupid. Why do they approve of those who do PPT more?
Because they consider the problem from the perspective of the boss, what he needs is a "solution." Consider the problem more from the perspective of a creator, instead of confining it to the perspective of a programmer;
Think more about what value this thing provides to people, not "how do I realize it." Of course, how to achieve it is necessary, but usually not the most important.
What is LBS-oriented application
Latitude and longitude is the combined name of longitude and latitude to form a coordinate system . Also known as geographic coordinate system, which is a three-dimensional spherical surface is defined using a spherical coordinate system space on Earth, can be marked on earth any location (after the decimal point. 7, the accuracy can be 1 cm) .
The range of longitude is (-180, 180), the range of latitude is (-90, 90], the plus or minus of latitude is bounded by the equator, the plus or minus of latitude is bounded by the equator, and the plus or minus of longitude is bounded by the prime meridian (Greenwich Observatory, UK). The east is positive and the west is negative.
near 160e79fa051fa4 are often
LBS
(Location Based Services), which is a service based on the user's current geographic location data and provides users with accurate encounter services.
The core ideas of people near are as follows:
- With "I" as the center, search for nearby Ta;
- Calculate the distance between the other person and "I" based on the current geographic location of "I";
- Sort by the distance between "I" and others, and filter out the users closest to me.
MySQL implementation
Calculate the "people nearby", calculate other data near this coordinate through a coordinate, sort by distance, how to start?
Taking the user as the center, given a radius of 1000 meters to draw a circle, then the user in the circular area is the "nearby person" we want to meet.
Store the latitude and longitude to MySQL
:
CREATE TABLE `nearby_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL COMMENT '名称',
`longitude` double DEFAULT NULL COMMENT '经度',
`latitude` double DEFAULT NULL COMMENT '纬度',
`create_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '创建时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it is impossible to traverse all the latitude and longitude of the "goddess" and the data is sorted according to the distance. This calculation is too much.
We can filter out the limited "goddess" coordinate data by region, and then perform full distance calculation and sorting on the data in the rectangular region, so that the amount of calculation is significantly reduced.
How to divide the rectangular area?
Put a square on the round jacket and filter the data according to the user's maximum and minimum longitude and latitude (longitude, latitude + distance) as a filter condition, and it is easy to search for the "goddess" information in the square.
What about the extra areas?
For users in this extra area, the distance to the dot must be greater than the radius of the circle, then we calculate the distance between the user’s center point and all users in the square, filters out all users whose distance is less than or equal to the radius , All users in the circular area are those near who meet the requirements.
In order to meet the high-performance rectangular area algorithm, the data table needs to add the composite index (longitude, latitude)
in the latitude and longitude coordinates, so that the query performance can be optimized.
Actual combat
A third-party library is used to obtain the maximum and minimum latitude and longitude of the bounding rectangle according to the latitude, longitude and distance, and to calculate the distance according to the longitude and latitude:
<dependency>
<groupId>com.spatial4j</groupId>
<artifactId>spatial4j</artifactId>
<version>0.5</version>
</dependency>
After obtaining the circumscribed rectangle, search for users in the square area with the maximum and minimum longitude and latitude values of , and then exclude users who exceed the specified distance, which is the final near 160e79fa0521ae.
/**
* 获取附近 x 米的人
*
* @param distance 搜索距离范围 单位km
* @param userLng 当前用户的经度
* @param userLat 当前用户的纬度
*/
public String nearBySearch(double distance, double userLng, double userLat) {
//1.获取外接正方形
Rectangle rectangle = getRectangle(distance, userLng, userLat);
//2.获取位置在正方形内的所有用户
List<User> users = userMapper.selectUser(rectangle.getMinX(), rectangle.getMaxX(), rectangle.getMinY(), rectangle.getMaxY());
//3.剔除半径超过指定距离的多余用户
users = users.stream()
.filter(a -> getDistance(a.getLongitude(), a.getLatitude(), userLng, userLat) <= distance)
.collect(Collectors.toList());
return JSON.toJSONString(users);
}
// 获取外接矩形
private Rectangle getRectangle(double distance, double userLng, double userLat) {
return spatialContext.getDistCalc()
.calcBoxByDistFromPt(spatialContext.makePoint(userLng, userLat),
distance * DistanceUtils.KM_TO_DEG, spatialContext, null);
}
/***
* 球面中,两点间的距离
* @param longitude 经度1
* @param latitude 纬度1
* @param userLng 经度2
* @param userLat 纬度2
* @return 返回距离,单位km
*/
private double getDistance(Double longitude, Double latitude, double userLng, double userLat) {
return spatialContext.calcDistance(spatialContext.makePoint(userLng, userLat),
spatialContext.makePoint(longitude, latitude)) * DistanceUtils.DEG_TO_KM;
}
Since the sorting of the distance between users is implemented in the business code, you can see that the SQL statement is also very simple.
SELECT * FROM nearby_user
WHERE 1=1
AND (longitude BETWEEN #{minlng} AND #{maxlng})
AND (latitude BETWEEN #{minlat} AND #{maxlat})
However, database query performance is limited after all. If there are many query requests for "people nearby", this may not be a good solution in high concurrency situations.
Attempt to Redis Hash failed
Let's analyze the characteristics of LBS data together:
- Each "goddess" has an ID number, and each ID corresponds to latitude and longitude information.
- When "Otaku" logs in to
app
get the "app
searches for the nearby "goddess" based on the latitude and longitude of the "Otaku". - After obtaining the "Goddess" ID list that matches the location, the "Goddess" information corresponding to the ID is obtained from the database and returned to the user.
The data feature is that a goddess (user) corresponds to a set of latitude and longitude, which reminds me of the Hash structure of Redis. That is, a key (goddess ID) corresponds to a value (latitude and longitude).
Hash
seems to be achievable, but in addition to recording the latitude and longitude, the LBS application also needs to perform range query on the data in the Hash collection, and convert it into distance sorting according to the latitude and longitude.
and the data in the Hash collection is unordered, so is obviously undesirable.
Sorted Set first sight
Is the Sorted Set type appropriate? Because it can be sorted.
Sorted Set
type is also a key
corresponding to a value
, key element content, and value `is the weight score of the element.
Sorted Set
can sort the elements according to their weight scores, which seems to meet our needs.
For example, the element of Sorted Set is "Goddess ID", and the corresponding weight score of the element is latitude and longitude information.
Here comes the question. The weight value of the Sorted Set element is a floating point number, and the latitude and longitude are two values of longitude and latitude. What should I do? Can the latitude and longitude be converted into a floating point number?
The idea is right. In order to compare the latitude and longitude, Redis uses the GeoHash code widely used in the industry to code the longitude and latitude separately, and finally combine the respective codes of the latitude and longitude into a final code.
In this way, the latitude and longitude are converted into a value, and the underlying data structure of the GEO type of Sorted Set
to implement .
Let's take a look at GeoHash
encodes latitude and longitude.
GEOHash encoding
please refer to: 160e79fa05240e https://en.wikipedia.org/wiki/Geohash
GeoHash
algorithm maps two-dimensional latitude and longitude data to one-dimensional integers, so that all elements will be mounted on a line, and the distance between the two-dimensional coordinates that are close to the one-dimensional point will be very close.
When we want to calculate "people nearby", we first map the target location to this line, and then get nearby points on this one-dimensional line.
GeoHash encoding will encode a longitude value into an N-bit binary value. Let's do N two-partition operations on the longitude range [-180,180], where N can be customized.
In the first two divisions, the longitude range [-180,180] will be divided into two sub-intervals: [-180,0) and [0,180] (I call it the left and right divisions).
At this point, we can check whether the longitude value to be encoded falls in the left partition or the right partition. For example, if falls in the left partition, we use 0 to represent; if it falls in the right partition, we use 1 to represent .
In this way, second partition, we can get the 1-bit coded value (either 0 or 1).
Then do a second partition for the partition to which the longitude value belongs, and at the same time, check again whether the longitude value falls in the left partition or the right partition after the second partition, and perform 1-bit encoding according to the previous rule. After finishing the second partition N times, the longitude value can be represented by an N bit number.
All map element coordinates will be placed in a unique grid. The smaller the square, the more accurate the coordinates. Then perform integer coding on these squares, and the closer the squares are, the closer the codes are.
After encoding, the coordinates of each map element will become an integer, and the coordinates of the element can be restored through this integer. The longer the integer, the smaller the loss of the restored coordinate value. For the "people nearby" function, the loss of accuracy is negligible.
For example, if the longitude value is equal to 169.99
perform 4-bit encoding (N = 4, do 4 partitions), and divide the longitude interval [-180,180] into the left partition [-180,0) and the right partition [0,180].
- 169.99 belongs to the right partition, use
1
represent the first partition code; - Then divide the [0, 180] interval to which 169.99 belongs after the first division into [0, 90) and [90, 180]. 169.99 is still in the right interval, coded as '1'.
- Divide [90, 180] into [90, 135) and [135, 180], this time in the left partition, code '0'.
In this way, we end up with a 4-bit code.
The coding idea of latitude is the same as that of longitude, so I won’t repeat it.
Combine longitude and latitude codes
If the calculated latitude and longitude codes are 11011 and 00101` respectively, the 0th digit of the target code uses the value 1 of the 0th longitude as the target value, and the first digit of the target code uses the 0th latitude value of 0 as the target value. And so on:
In this way, the latitude and longitude (35.679, 114.020) can be 1010011011
, and this value can be used as the weight value of SortedSet
Redis GEO implementation
The GEO type uses the GeoHash-encoded combined value of the latitude and longitude as the score weight of the Sorted Set element. What instructions does the GEO of Redis have?
We need to save the ID of the girl who logged in to the app and the corresponding latitude and longitude in the Sorted Set.
For more GEO type commands, please refer to: https://redis.io/commands#geo
GEOADD
Redis provides the GEOADD key longitude latitude member
command to record a set of longitude and latitude information and the corresponding "goddess ID" into a GEO type collection, as follows: Record the longitude and latitude information of multiple users (Sora Aoi, Yui Hatano) at one time.
GEOADD girl:localtion 13.361389 38.115556 "苍井空" 15.087269 37.502669 "波多野结衣"
GEORADIUS
I logged into the app to get my own latitude and longitude information. How can I find other users within a certain range centered on this latitude and longitude?
Redis GEO
type provides the GEORADIUS
command: according to the input latitude and longitude position, it will search for other elements within a certain range centered on this latitude and longitude.
Assuming that your latitude and longitude is (15.087269 37.502669), you need to get the "goddess" nearby 10 km and return it to the LBS application:
GEORADIUS girl:locations 15.087269 37.502669 km ASC COUNT 10
ASC
can make the "goddess" information sorted according to the longitude and latitude of the distance from the nearest to the farthest.
COUNT
option indicates the number of "Goddess" to be returned to prevent too many "Goddess" nearby and save bandwidth resources.
If you feel that you need more goddesses, there is no limit, but you need to pay attention to your body and eat more eggs to make up for it.
After the user goes offline, how about deleting the longitude and latitude of the offline "goddess"?
This is a good question. The GEO
type is Sorted Set
, so you can borrow the ZREM
command to delete the geographic location information.
For example, delete the location information of "Aoi Sora":
ZREM girl:localtion "苍井空"
summary
GEO itself does not design a new underlying data structure, but directly uses the Sorted Set collection type.
GEO type uses GeoHash coding method to realize the conversion of latitude and longitude to element weight score in Sorted Set. The two key mechanisms are to divide the two-dimensional map and encode the interval.
After a set of latitude and longitude fall in a certain interval, it is represented by the code value of the interval, and the code value is used as the weight score of the Sorted Set element.
In a map application, there may be millions of data about cars, restaurants, and people. If you use the Geo data structure of Redis, they will all be placed in a zset collection.
In the Redis cluster environment, the collection may be migrated from one node to another node. If the data of a single key is too large, it will have a greater impact on the migration of the cluster. In a cluster environment, the amount of data corresponding to a single key is not appropriate. More than 1M, otherwise it will cause the cluster migration to be stuck and affect the normal operation of online services.
Therefore, it is recommended that Geo data be deployed using a separate Redis cluster instance.
If the amount of data exceeds 100 million or even more, you need to split the Geo data, split by country, province, city, and even in cities with very large populations.
This can significantly reduce the size of a single zset collection.
Giant Shoulder
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。