I. Overview

Recently, the official website gave a performance test report of RedisJson (RedisSearch), which can be described as crushing other NoSQL. The following is the core report content, let's first conclude:

  • For isolated writes, RedisJSON is 5.4 times faster than MongoDB and more than 200 times faster than ElasticSearch.
  • For isolated reads, RedisJSON is 12.7 times faster than MongoDB and more than 500 times faster than ElasticSearch.

In a mixed workload scenario, real-time updates will not affect the search and read performance of RedisJSON, while ElasticSearch will be affected. The following is the specific data:

  • The number of operations per second supported by RedisJSON* is about 50 times higher than MongoDB and 7 times higher than ElasticSearch.
  • The latency of RedisJSON* is about 90 times lower than MongoDB and 23.7 times lower than ElasticSearch.

In addition, the read, write, and load search latency of RedisJSON is far more stable than ElasticSearch and MongoDB in the higher percentiles. When the write ratio is increased, RedisJSON can also handle higher and higher overall throughput, and when the write ratio increases, ElasticSearch will reduce the overall throughput it can handle.

Second, the query engine

As mentioned earlier, the development of research and RedisJSON places great emphasis on performance. For each version, we want to ensure that developers can experience stability and products. To this end, we provide some analysis tools and detectors for performance analysis.

And, every time we release a new version, we are constantly improving performance. Especially for research, version 2.2 is 1.7 times faster than 2.0 in loading and query performance, while also improving throughput and data loading latency.

2.1 Load optimization

The next two figures show the results of running the New York City taxi benchmark test ( detailed data can be viewed here , this benchmark measures basic data such as throughput and loading time.

在这里插入图片描述
在这里插入图片描述
As can be seen from these charts, every new version of research has a substantial performance improvement.

2.2 Full-text search optimization

To evaluate search performance, we indexed 5.9 million Wikipedia abstracts. Then we run a full-text search query panel, and the results obtained are as shown in the figure below ( detailed information is here ).
在这里插入图片描述
在这里插入图片描述
As can be seen from the above figure, by migrating from v2.0 to v2.2, the same data has been greatly improved in writing, reading, and searching (delay graph), thereby improving the running of Search and JSON. Achievable throughput.

3. Comparison with other frameworks

In order to evaluate the performance of RedisJSON, we decided to compare it with MongoDB and ElasticSearch. In order to facilitate comparison, we will conduct a comprehensive comparison in terms of document storage, local availability, cloud availability, professional support, and scalability and performance.

We use the perfect YCSB standard for testing and comparison. It can evaluate different products based on common workloads and measure latency and throughput curves until saturation. In addition to the CRUD YCSB operation, we also added a two-word search operation to help developers, system architects and DevOps practitioners find the best search engine for their use cases.

3.1 Benchmarking

In this test, we used the following software environments:

  • MongoDB v5.0.3
  • ElasticSearch 7.15
  • RedisJSON (RediSearch 2.2+RedisJSON 2.0)

This time I run benchmark tests on Amazon Web Services instances. All three solutions are distributed databases and are most commonly used in production in distributed ways. This is why all products use the same generic m5d.8xlarge VM and local SSD, and each setup consists of four VMs: one client + three database servers. Both the benchmark client and database server run on separate m5d.8xlarge instances under optimal network conditions. The instances are tightly packaged in an availability zone to achieve the low latency and stable network performance required for steady-state analysis.

The test was performed on a three-node cluster, and the deployment details are as follows:

  • MongoDB 5.0.3 : Three-member replica set (Primary-Secondary-Secondary). Replicas are used to increase read capacity and allow lower latency reads. In order to support the text search query of the string content, a text index is created on the search field.
  • ElasticSearch 7.15 : 15 shard settings, query cache enabled, and a RAID 0 array for 2 local NVMe-based SSDs to achieve a higher level of file system-related elastic operation performance. These 15 shards provide the best achievable performance results for all the shard variants we have made for Elastic.
  • RedisJSON *: RediSearch 2.2 and RedisJSON 2.0: OSS Redis Cluster v6.2.6, there are 27 shards, evenly distributed on three nodes, loaded with RediSearch 2.2 and RedisJSON 2.0 OSS modules.

In addition to this main benchmark/performance analysis scenario, we also run benchmark tests on the network, memory, CPU, and I/O to understand the underlying network and virtual machine characteristics. During the entire benchmark test set, network performance remained below the measurement limits of bandwidth and PPS to produce stable and stable ultra-low latency network transmission (p99 <100micros per packet).

Next, we will start by providing separate operational performance [100% write] and [100% read], and end with a set of mixed workloads to simulate real-world application scenarios.

3.2 100% write benchmark

As shown in the figure below, the benchmark test shows that RedisJSON* ingest speed is 8.8 times faster than ElasticSearch and 1.8 times faster than MongoDB, while maintaining sub-millisecond latency for each operation. It is worth noting that 99% of Redis requests are completed in less than 1.5 milliseconds.

In addition, RedisJSON* is the only solution we have tested that automatically updates its index every time it is written. This means that any subsequent search query will find the updated document. ElasticSearch does not have this fine-grained capacity; it puts ingested documents in an internal queue, and the queue is refreshed by the server (not controlled by the client) every N documents or every M seconds. They call this method Near Real Time (NRT). The Apache Lucene library (which implements the full-text function of ElasticSearch) is designed for fast search, but the indexing process is complicated and cumbersome. As shown in these WRITE benchmark test charts, ElasticSearch has paid a huge price due to this "design" limitation.

Combining latency and throughput improvements, RedisJSON* is 5.4 times faster than Mongodb and more than 200 times faster than ElasticSearch for isolated writes.
在这里插入图片描述
在这里插入图片描述

3.3 100% reading benchmark

Similar to writing, we can observe that Redis performs best in reads, allowing reads 15.8 times more than ElasticSearch and 2.8 times more than MongoDB, while maintaining sub-millisecond latency throughout the entire latency range, as shown in the following table.

When combined with latency and throughput improvements, RedisJSON* is 12.7 times faster than MongoDB and more than 500 times faster than ElasticSearch for isolated reads.

在这里插入图片描述
在这里插入图片描述

3.4 Hybrid read/write/search benchmarks

Real-world application workloads are almost always a mix of read, write, and search queries. Therefore, it is more important to understand the resulting mixed workload throughput curve when approaching saturation.

As a starting point, we considered a 65% search and 35% read scenario, which represents a common real-world scenario in which we perform more searches/queries than direct reads. The initial combination of 65% search, 35% read, and 0% update also resulted in equal throughput of ElasticSearch and RedisJSON*. Nevertheless, the YCSB workload allows you to specify the ratio between search/read/update to meet your requirements.

"Search performance" can refer to different types of searches, such as "matching query search", "faceted search", "fuzzy search" and so on. The search workload that we initially added to YCSB only focused on "matching query search", which mimics paging two-word query matching, sorted by numeric fields. "Matching Query Search" is the starting point for search analysis by any supplier that has the search function enabled. Therefore, every database/driver that supports YCSB should be able to easily enable this function on its benchmark driver.

In each test variant, we added 10% of writes to mix and reduce the percentage of searches and reads in the same proportions. The goal of these test variants is to understand how each product handles real-time updates of data. We believe this is the de facto architectural goal, that is, writes are immediately submitted to the index, and reads are always up-to-date.

在这里插入图片描述
As you can see in the graph, continuously updating data and increasing the write ratio on RedisJSON* will not affect read or search performance and increase overall throughput. The more updates that are made to the data, the greater the impact on ElasticSearch performance, which ultimately results in slower reading and searching speeds.

The achievable ops/sec of ElasticSearch has evolved from 0% to 50%. We noticed that it started at 10k Ops/sec on the 0% update benchmark and was severely affected, reducing the ops/sec by 5 times at 50%. % Update rate benchmark.

Similar to what we observed in the above single operation benchmark, MongoDB search performance is and ElasticSearch. The maximum total throughput of MongoDB is 424 ops/sec, while RedisJSON is 16K maximum ops/sec.

Finally, for mixed workloads, the 50.8 times higher than MongoDB and 7 times higher than ElasticSearch. If we focus our analysis on the latency of each operation type during mixed workloads, RedisJSON can reduce latency by up to 91 times compared with MongoDB, and 23.7 times compared with ElasticSearch.

3.5 Complete delay analysis

Similar to measuring the throughput curve generated before each solution saturates, it is also important to perform a complete delay analysis under a sustainable load common to all solutions. This will enable you to understand what is the most stable solution in terms of latency for all published operations, and which solution is not susceptible to spikes in latency caused by application logic (for example, elastic query cache misses). If you want a deeper understanding of why we do this, Gil Tene provides an in-depth overview of latency measurement considerations.

  • Looking at the throughput chart in the previous section, and focusing on the 10% update benchmark to include all three operations, we made two different sustainable load changes:
  • 250 ops/sec: Compare MongoDB, ElasticSearch and RedisJSON*, lower than MongoDB's pressure rate.
  • ops/sec: Compare ElasticSearch and RedisJSON*, which is lower than ElasticSearch's pressure rate.

3.5.1 Latency analysis of MongoDB and ElasticSearch and RedisJSON*

In the first picture below, the percentile from p0 to p9999 is shown. Obviously, in each search, MongoDB performs far better than Elastic and RedisJSON . In addition, paying attention to ElasticSearch and RedisJSON , it is obvious that ElasticSearch is susceptible to higher latency, which may be caused by garbage collection (GC) triggers or search query cache misses. The p99 of RedisJSON* is lower than 2.61 milliseconds, while the ElasticSearch p999 search reaches 10.28 milliseconds.
在这里插入图片描述
In the read and update chart below, we can see that RedisJSON* performs best in all latency ranges, followed by MongoDB and ElasticSearch.

RedisJSON is the only solution that maintains sub-millisecond latency at all analyzed latency percentiles. On p99, RedisJSON a latency of 0.23 milliseconds, followed by MongoDB's 5.01 milliseconds and ElasticSearch's 10.49 milliseconds.

在这里插入图片描述
When writing, MongoDB and RedisJSON* can maintain sub-millisecond latency even at p99. On the other hand, ElasticSearch shows high-tail latency (> 10 ms), which is likely to be the same as the cause (GC) that caused the peak in ElasticSearch search.

在这里插入图片描述

3.5.2 Delay analysis of ElasticSearch and RedisJSON

Focusing only on ElasticSearch and RedisJSON , while maintaining a sustainable load of 6K ops/sec, we can observe that are consistent with the analysis performed at 250 ops/sec. RedisJSON* is a more stable solution. Its p99 read time is 3 milliseconds, while Elastic's p99 read time is 162 milliseconds.

At the time of the update, RedisJSON* retained a p99 of 3 milliseconds, while ElasticSearch retained a p99 of 167 milliseconds.

在这里插入图片描述
在这里插入图片描述
Focusing on search operations, ElasticSearch and RedisJSON 1619f034d289bb with a single-digit p50 delay (p50 RedisJSON is 1.13 milliseconds, and ElasticSearch's p50 is 2.79 milliseconds), in which ElasticSearch pays a higher price for GC triggers and query cache misses On the percentile, it is clearly visible on the >= p90 percentile.

RedisJSON* keeps p99 below 33 milliseconds, while the percentile of p99 on ElasticSearch is 163 milliseconds, which is 5 times higher.

在这里插入图片描述

Fourth, how to start

To start using RedisJSON*, we can create a free database Redis cloud in all regions, or use RedisJSON docker container . We have updated the redisjson document to facilitate developers to quickly start using query and search functions. In addition, as we client library statement, the following are client drivers in several popular languages to help you get started quickly.

RedisJSON*
Node.jsnode-redis
JavaJedis
.NETNRedisJSON NRediSearch
Pythonredis-py

Reference: RedisJSON: Public Preview & Performance Benchmarking RedisJson releases official performance report, performance crushes ES and Mongo


xiangzhihong
5.9k 声望15.3k 粉丝

著有《React Native移动开发实战》1,2,3、《Kotlin入门与实战》《Weex跨平台开发实战》、《Flutter跨平台开发与实战》1,2和《Android应用开发实战》