GEO of Redis source code analysis-How does Redis efficiently retrieve geographic locations?

Redis GEO is used to store geographic location information and operate on the stored information. Through geo-related commands, it is easy to store and use latitude and longitude coordinate information in redis. The Geo commands provided in Redis are as follows:

geoadd: Add the latitude and longitude coordinates and the name of the corresponding geographic location.
geopos: Obtain the latitude and longitude coordinates of a geographic location.
geodist: Calculate the distance between two geographic locations.
georadius: According to the longitude and latitude coordinates given by the user to obtain the geographic location set within the specified range.
georadiusbymember: Get a geographic location set within a specified range according to a location stored in the location set.
geohash: Calculate the geohash value of one or more latitude and longitude coordinate points.

To understand how Redis's GEO-related commands are implemented, you must first understand the principles of geohash. In essence, these commands are just encapsulation of geohash data.

geohash

Geohash is a coding method invented by Gustavo Niemeye in 2008 to encode latitude and longitude information. For example, the latitude and longitude coordinates of Beijing city center are 116.404844,39.912279 wx4g0cg3vknd after being encoded with 12-bit geohash. How does it work? In fact, the principle is very simple, that is, divided into . The entire encoding process can be divided into the following steps.

1. Convert to binary

Those who have gone to junior high school geography know how a point on the earth can be identified as a certain latitude and longitude coordinate. The range of longitude is 0-180 degrees east longitude and 0-180 degrees west longitude, and the value range of latitude is 0 north latitude. To 90 and 0-90 degrees south latitude. Excluding the east, west, south, and north, it can be considered that the value ranges of longitude and latitude are [-180,180] and [-90,90], respectively.

Let’s look at the longitude first. [-180,180] can be simply divided into two parts [-180,0] and [0,180]. For a given specific value, we use a bit to identify whether it is in [-180,0] or [0,180] In the interval. Then we can continue to subdivide these two sub-intervals, using more bits to identify which sub-interval the value is in. It is like using a binary search to record the path of each search, from 0 to the left and 1 to the right. After searching, we will get a 0101 string, which can be used to identify the longitude value.

The same is true for the dimension, except that its return value becomes [-90,90]. After encoding in these two ways, we can get two strings of 0 and 1 at any latitude and longitude.
For example, the longitude and latitude coordinates of the center of Beijing are 116.404844,39.912279 . We first encode 116.404844 and get the binary:

11010010110001101101

Then we 39.912279 get the binary as:

10111000110000111001

2. Binary merge of latitude and longitude

Next, we only need to merge the above binary interleaving into one. Here, note that occupies even digits for longitude and latitude occupies for odd digits to get the final binary.

1101101110000200111100000001111011010011

3. Encode the merged binary into base32

Finally, we encode the merged binary into base32, convert the consecutive 5 digits into a 0-31 decimal number, and then replace it with the corresponding character. After all the binary digits are processed, we complete the base32 encoding. The coding table is as follows:
在这里插入图片描述
Finally, the geohash value wx4g0cg3vknd .

Geohash divides the space continuously, then converts the bisection path into base32 encoding, and finally saves it. From the principle, it can be seen that geohash represents an interval, not a point . The longer the geohash value, the interval will be The smaller the size, the more accurate the location of the logo. The figure below shows the latitude and longitude errors under different length geohash in Wikipedia (lat: latitude, lng: longitude)
在这里插入图片描述

Uses and problems of geohash

Geohash successfully encodes a two-dimensional information into a one-dimensional information. I think there are two advantages to encoding in this way: 1. The data length becomes shorter after encoding, which is conducive to saving storage. 2. Facilitate the use of prefix search. Let's talk about the second point in detail.

From the implementation of two coordinate points have a common prefix, you and we can be sure that these two points are in the same area (the area size depends on the length of the common prefix). The advantage of this feature to us is that we can index all coordinate points by geohash in increasing order, and then filter by prefix when searching, which greatly improves the performance of retrieval.

For example, suppose I'm looking for a restaurant within 3 kilometers of Beijing International Trade Center. It is known that the geohash of China World Trade Center is wx4g41. Then I can easily calculate which points in the area I need to scan. But there is one point to note. As I mentioned above, the geohash value actually represents an area, not a point. After finding a batch of candidate points, you need to traverse once to calculate the precise distance.
在这里插入图片描述
There is a problem with geohash that needs attention. Geohash encodes the two-dimensional coordinate points offline (as shown in the figure below). Sometimes it may give people a misunderstanding that if the binary difference between two geohash is smaller, the distance between the two intervals is closer. This is completely It is wrong, such as the following figure 0111 and 1000, the binary difference between these two intervals is only 0001 but the actual physical distance is relatively far.
在这里插入图片描述
If the above picture is not obvious, I got a picture from Wikipedia. The dotted line is the path of linear indexing. The geohash values of the two blocks linked by the dotted line are very similar, as shown in the following figure (7,3) and (0 ,. 4), geohash value very similar, but the actual physical distance is very far away, which is abrupt change of geohash , which also results in the size can not be directly determined from the two regions is directly based on the value geohash.
在这里插入图片描述
But in the actual use of geohash, I often encounter cross-domain search situations. For example, I want to search for all other point sets that are 1 distance unit away from a point in the interval of the above figure (3,3). The point set may span (3,3) plus 9 intervals of 8 neighborhoods around it. The problem of mutation will cause the geohash of these 9 intervals to not jump linearly, but it is not impossible to calculate. In fact You can easily calculate the 8 neighborhoods of a certain geohash through special bit operations. For details, please refer to the specific implementation of geohashNeighbors() in src/geohash.c in the redis source code. GeohashNeighbors uses two functions geohash_move_x and geohash_move_y. The geohash moves left and right and up and down, so that the geohash values of 8 neighborhoods can be easily combined.

static void geohash_move_x(GeoHashBits *hash, int8_t d) {
    if (d == 0)
        return;

    uint64_t x = hash->bits & 0xaaaaaaaaaaaaaaaaULL;
    uint64_t y = hash->bits & 0x5555555555555555ULL;

    uint64_t zz = 0x5555555555555555ULL >> (64 - hash->step * 2);

    if (d > 0) {
        x = x + (zz + 1);
    } else {
        x = x | zz;
        x = x - (zz + 1);
    }

    x &= (0xaaaaaaaaaaaaaaaaULL >> (64 - hash->step * 2));
    hash->bits = (x | y);
}

static void geohash_move_y(GeoHashBits *hash, int8_t d) {
    if (d == 0)
        return;

    uint64_t x = hash->bits & 0xaaaaaaaaaaaaaaaaULL;
    uint64_t y = hash->bits & 0x5555555555555555ULL;

    uint64_t zz = 0xaaaaaaaaaaaaaaaaULL >> (64 - hash->step * 2);
    if (d > 0) {
        y = y + (zz + 1);
    } else {
        y = y | zz;
        y = y - (zz + 1);
    }
    y &= (0x5555555555555555ULL >> (64 - hash->step * 2));
    hash->bits = (x | y);
}

Geo in redis

The above has spent a lot of space explaining the implementation of geohash. In fact, seeing this, you have basically understood the implementation of geohash in redis. In essence, the geo in redis is an encapsulation of geohash. The specific geohash-related code will not be listed for everyone (you can check it yourself), but I will introduce you to the general process in redis geo.
First of all, you may be most curious about how geohash is stored in redis. You can get a glimpse from the implementation of the geoadd command.

/* GEOADD key [CH] [NX|XX] long lat name [long2 lat2 name2 ... longN latN nameN] */
void geoaddCommand(client *c) {
    int xx = 0, nx = 0, longidx = 2;
    int i;

    /* 解析可选参数 */
    while (longidx < c->argc) {
        char *opt = c->argv[longidx]->ptr;
        if (!strcasecmp(opt,"nx")) nx = 1;
        else if (!strcasecmp(opt,"xx")) xx = 1;
        else if (!strcasecmp(opt,"ch")) {}
        else break;
        longidx++;
    }

    if ((c->argc - longidx) % 3 || (xx && nx)) {
        /* 解析所有的经纬度值和member，并对其个数做校验 */
            addReplyErrorObject(c,shared.syntaxerr);
        return;
    }

    /* 构建zadd的参数数组 */
    int elements = (c->argc - longidx) / 3;
    int argc = longidx+elements*2; /* ZADD key [CH] [NX|XX] score ele ... */
    robj **argv = zcalloc(argc*sizeof(robj*));
    argv[0] = createRawStringObject("zadd",4);
    for (i = 1; i < longidx; i++) {
        argv[i] = c->argv[i];
        incrRefCount(argv[i]);
    }

    /* 以3个参数为一组，将所有的经纬度和member信息从参数列表里解析出来，并放到zadd的参数数组中 */
    for (i = 0; i < elements; i++) {
        double xy[2];

        if (extractLongLatOrReply(c, (c->argv+longidx)+(i*3),xy) == C_ERR) {
            for (i = 0; i < argc; i++)
                if (argv[i]) decrRefCount(argv[i]);
            zfree(argv);
            return;
        }

        /* 将经纬度坐标转化成score信息 */
        GeoHashBits hash;
        geohashEncodeWGS84(xy[0], xy[1], GEO_STEP_MAX, &hash);
        GeoHashFix52Bits bits = geohashAlign52Bits(hash);
        robj *score = createObject(OBJ_STRING, sdsfromlonglong(bits));
        robj *val = c->argv[longidx + i * 3 + 2];
        argv[longidx+i*2] = score;
        argv[longidx+1+i*2] = val;
        incrRefCount(val);
    }

    /* 转化成zadd命令所需要的参数格式*/
    replaceClientCommandVector(c,argc,argv);
    zaddCommand(c);
}

It turns out that the storage of geo is just that zset is wrapped in a shell (is it a little disappointing). For the specific implementation of zset, please refer to the article I wrote The implementation of skiplist in redis .

Let's take a closer look at the general execution flow of georadius (the code is too long, so delete a lot of detailed code).

void georadiusGeneric(client *c, int srcKeyIndex, int flags) {
    robj *storekey = NULL;
    int storedist = 0; /* 0 for STORE, 1 for STOREDIST. */

    /* 根据key找找到对应的zojb */
    robj *zobj = NULL;
    if ((zobj = lookupKeyReadOrReply(c, c->argv[srcKeyIndex], shared.emptyarray)) == NULL ||
        checkType(c, zobj, OBJ_ZSET)) {
        return;
    }

    /* 解析请求中的经纬度值 */
    int base_args;
    GeoShape shape = {0};
    if (flags & RADIUS_COORDS) {
    /*
     * 各种必选参数的解析，省略细节代码，主要是解析坐标点信息和半径   
     */ 
    }

    /* 解析所有的可选参数. */
    int withdist = 0, withhash = 0, withcoords = 0;
    int frommember = 0, fromloc = 0, byradius = 0, bybox = 0;
    int sort = SORT_NONE;
    int any = 0; /* any=1 means a limited search, stop as soon as enough results were found. */
    long long count = 0;  /* Max number of results to return. 0 means unlimited. */
    if (c->argc > base_args) {
    /*
     * 各种可选参数的解析，省略细节代码   
     */ 
    }
    
    /* Get all neighbor geohash boxes for our radius search
     * 获取到要查找范围内所有的9个geo邻域 */
    GeoHashRadius georadius = geohashCalculateAreasByShapeWGS84(&shape);

    /* 创建geoArray存储结果列表 */
    geoArray *ga = geoArrayCreate();
    /* 扫描9个区域中是否有满足条的点，有就放到geoArray中 */
    membersOfAllNeighbors(zobj, georadius, &shape, ga, any ? count : 0);

    /* 如果没有匹配结果，返回空对象 */
    if (ga->used == 0 && storekey == NULL) {
        addReply(c,shared.emptyarray);
        geoArrayFree(ga);
        return;
    }

    long result_length = ga->used;
    long returned_items = (count == 0 || result_length < count) ?
                          result_length : count;
    long option_length = 0;

    /* 
     * 后续一些参数逻辑，比如处理排序，存储……
     */
    // 释放geoArray占用的空间 
    geoArrayFree(ga);
}

The above code has deleted a lot of details, and interested students can check it out by themselves. However, it can be seen that the overall process of georadius is very clear.

Parse the request parameters.
Calculate the geohash and 8 neighbors where the target coordinates are located.
Find all point sets that meet the distance limit in these 9 areas in zset.
Process the subsequent logic such as sorting.
Clean up temporary storage space.

Concluding remarks

Due to the limited space of the article and the emphasis on the implementation of geohash, it did not explain the various details related to geo in redis. If readers are interested, they can read src/geo.c in redis for details.

Reference

This article is a series of blog posts on Redis source code analysis, and there is also a corresponding Redis Chinese annotation version. For students who want to learn Redis in depth, welcome stars and attention.
Redis Chinese annotation version warehouse: https://github.com/xindoo/Redis
Redis source code analysis column: https://zxs.io/s/1h

GEO of Redis source code analysis-How does Redis efficiently retrieve geographic locations?

geohash

1. Convert to binary

2. Binary merge of latitude and longitude

3. Encode the merged binary into base32

Uses and problems of geohash

Geo in redis

Concluding remarks

Reference

xindoo

引用和评论

OpenAI的结构化浅析

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊

Jerry和您聊聊Chrome开发者工具

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

💢线上高延迟请求排查

这些年

一文讲清楚static关键字

GEO of Redis source code analysis-How does Redis efficiently retrieve geographic locations?

geohash

1. Convert to binary

2. Binary merge of latitude and longitude

3. Encode the merged binary into base32

Uses and problems of geohash

Geo in redis

Concluding remarks

Reference

xindoo

引用和评论

OpenAI的结构化浅析

Bitmap 和 布隆过滤器傻傻分不清？你这不应该啊

Jerry和您聊聊Chrome开发者工具

Spring 实现 3 种异步流式接口，干掉接口超时烦恼

💢线上高延迟请求排查

这些年

一文讲清楚static关键字

Bitmap 和布隆过滤器傻傻分不清？你这不应该啊