Scaling ArangoDB to gigabytes per second on Mesosphere’s DCOS

In this blog post, we explain how an ArangoDB cluster (ArangoDB on Github) with 640 virtual CPUs can sustain a write load of 1.1 million JSON documents per second, which amounts to approximately 1GB of data per second. It takes a single short command and a few minutes to deploy this system, running on 80 nodes, using the Mesosphere Datacenter Operating System (DCOS).

Introduction

This blog post is an abridged version of our whitepaper, which contains all the results and details of the setup. The purpose of this work is to show that an ArangoDB cluster scales out well when used as a distributed document (JSON) store. It complements the benchmark series here, which looked into the performance of a single instance of ArangoDB as a multi-model database, comparing it with pure document stores, graph databases and other multi-model solutions.

“Scaling out well” means two things: (1) linear growth of performance and (2) ease of deployment and operational convenience. We achieve the latter by being the first and, so far, only operational database that is fully certified for the Mesosphere DCOS.

We would like to put the experiments at hand into a slightly broader context. In the age of polyglot persistence, software architects carefully devise their microservice-based designs around different data models like key/value, documents or graphs, considering the different scalability capabilities. With ArangoDB’s native multi-model approach, they can deploy a tailor-made cluster, or single-instance, of ArangoDB for each microservice.

In this context, it is important to emphasize that the multi-model idea does not imply a monolithic design! To the contrary, some services might require a large document-store cluster, while others might employ a single-server graph database with asynchronous replicas and a load balancer for reads. Even services using a mixture of data models are easily possible.

The multi-model approach avoids the additional operational complexity of using and deploying multiple different data stores in a single project. One can use a single uniform query language across all instances and data models, and has the additional option to combine multiple data models within a single service.

We have published all test code at https://github.com/ArangoDB/1...

Test setup

To simulate such a high load, we use a special-purpose C++ program that performs the HTTP/REST requests and does the statistical analysis. We start multiple instances of this client on several machines to produce a large amount of traffic. We test read-only, write-only and 50-50 read-write in a pseudo-random pattern. The tested loads are typical for an application dealing with single-document reads and writes. The type of use case we had in mind when we designed the benchmark was that of an application or web server that performs a large volume of requests to the database, coming from multiple instances.

The atomicity of the operations is at the (JSON) document level and ArangoDB offers either “last write wins” or “compare and swap” semantics. That is, the application uses document operations as its notion of transactions. This is a very powerful concept which uses the particular capabilities of a document store well. If one would use a distributed key/value store instead, one would have to perform about 20 times as many individual operations and put groups of 20 of them into a transaction.

We scale the database cluster from 8 instances in steps of 8 instances, up to 80 instances each with 8 virtual CPUs. At the same time, we scale the load servers from 4 instances in steps of 4 instances, up to 40 instances each with 8 virtual CPUs. That is, we use from 64 to 640 virtual CPUs for ArangoDB and another 32 to 320 virtual CPUs for the load servers.

Deployment on Mesosphere

With the new Apache Mesos framework for ArangoDB, the deployment is fantastically convenient, provided you already have a Mesos or Mesosphere DCOS cluster running. We use the DCOS command line utility, which in turn taps into the ArangoDB framework and the DCOS subcommand. Starting up an ArangoDB cluster is as simple as typing:

dcos package install --options=config.json arangodb
Apache Mesos manages the cluster resources and finds a good place to deploy the individual “tasks.” If the machines in your Mesos cluster have the right sizes, then Apache Mesos will find the exact distribution you intend to use.

Deploying the load servers is another piece of cake by sending a JSON document to Apache Marathon, using curl and a single HTTP/REST POST request.

For the different cluster sizes, we use the Amazon Cloud Formation template to scale the Mesosphere cluster up or down, and then in turn deploy the ArangoDB cluster and the load servers conveniently.

The results

Here we only show the results for the pure-write load. Throughput is measured in thousands of documents per second for the whole cluster. Latency times are in milliseconds, and we give the median (50th percentile) and the 95th percentile. All graphs in a figure are aligned horizontally, such that the different values for the same cluster size appear stacked vertically. The actual absolute numbers follow in the table below the plot.

(See the section titled “Detailed results” in the whitepaper for all of the graphs, tables and further background information.)

arangoimage

arangotable

We observe very good scalability, indeed: Already with the second largest cluster size of 72 nodes, we hit our performance aim of 1,000,000 document writes per second using 576 virtual CPUs. The top is 1.1 million writes with 80 nodes and 640 virtual CPUs.

The latency is reasonable: the median is between 2 and 5 milliseconds, and the 95th percentile is still below 30 milliseconds. Looking at these latency numbers, it’s clear we as developers of ArangoDB have some more homework to do in order to decrease the variance in latency.

Discussion and comparisons

For the pure write load, this means a peak total network transfer of 0.95GB per second from the load servers to the distributed database, and 140MB per second for the answers. Roughly the same amount of network transfers happens within the ArangoDB cluster for the primary storage servers, since each JSON document write has to be forwarded from the client-facing instance to the actual database server holding the data. For the asynchronous replication, this amount must be transferred another time.

The total amount of data each of the 80 ArangoDB nodes has to write to local SSD is about 46MB per second, without taking any write amplification into account. That is the reason that we use the rather expensive i2-2xlarge instance types in AWS, because only these offer enough I/O bandwidth to pull off the presented benchmark results.

Let us now consider throughput per virtual CPU and compare this to other published benchmarks from competitors. Our peak throughput performance in the largest configuration (pure write load) is around 1,730 documents per second per virtual CPU, and we keep the median latency below 5 milliseconds and the 95th percentile below 30 milliseconds.

The FoundationDB benchmark achieved 14.4 million key/value writes per second and bundled them into transactions of 20. Taking this as 720,000 write transactions per second results in about 750 write transactions per virtual CPU without any replication.

Netflix, in its Cassandra benchmark, claimed 1.1 million writes per second using Datastax’s enterprise edition of Cassandra. This amounts to writing 965 documents per second per virtual CPU with a median latency of 5 milliseconds and a 95th percentile of 10 milliseconds. Google’s benchmark of Cassandra uses more virtual CPUs and a replication factor of 3 with a write quorum of 2. As a consequence, it only achieves 394 documents per second per virtual CPU.

In the Couchbase benchmark, it sustained 1.1 million writes per second resulting in around 1,375 documents per second per virtual CPU, with a median latency of 15 milliseconds and a 95th percentile of 27 milliseconds.

Last but not least, the Aerospike benchmark claimed 1 million writes per second, which amounts to 2,500 documents per second per virtual CPU. Aerospike claims 83 percent of all requests took less than 16 milliseconds, and 96.5 percent of all requests took less than 32 milliseconds. Its publication does not mention any replication or disk synchronization, but claims low costs by using slow disks.

ArangoDB actually outperforms the competition considerably in the document-writes-per-virtual-CPU comparison.

Conclusion

As initially planned, this performance test only considers single-document reads and writes (that is, using the ArangoDB cluster simply as a scalable document store). Note that it is possible to add secondary indexes to the sharded collection, and for ArangoDB this means that each shard maintains a corresponding secondary index and updates it with each document write or update. This means that operations like range queries for indexed attributes will be efficient.

If the sharding is done by the primary key, then key lookups by other attributes will have to be sent to each server and will thus be less efficient. As a consequence, complicated join operations will potentially not scale well to large clusters, similar to complex graph queries, which are a fundamental challenge for large sharded graphs.

This complete benchmark test was done with the current development version of ArangoDB as of November 2015. Everything will work as already described in version 2.8, which is due to be released very soon. This release will contain the next iteration of our Mesosphere DCOS integration and will thus offer convenient setup of asynchronous replication with automatic failover.

In the first quarter of 2016, we will release version 3.0 of ArangoDB, which will contain synchronous replication and full automatic failover, as well as a lot of cluster improvements for performance, stability and convenience of scaling. It, too, will be fully integrated with the Mesosphere DCOS.

The full whitepaper with all results of the ArangoDB benchmark test is available here.

英文原文链接地址:
https://mesosphere.com/blog/2...

身在德国的开源数据库女纸~

35 声望
58 粉丝
0 条评论
推荐阅读
汤森路透 Thomson Reuters--使用多模型数据库ArangoDB 打造快速安全的简单视图分析
摘要: 汤森路透为专业人士提供所需的智能,技术和人力资源,以便找到值得信赖的答案。它们使财务风险,法律,税务会计以及媒体市场的专业人员能够做出最重要的决定,所有这些都由世界上最值得信赖的新闻机构提供...

GermanWifi阅读 2.3k

一次偶然机会发现的MySQL“负优化”
今天要讲的这件事和上述的两个sql有关,是数年前遇到的一个关于MySQL查询性能的问题。主要是最近刷到了一些关于MySQL查询性能的文章,大部分文章中讲到的都只是一些常见的索引失效场合,于是我回想起了当初被那个...

骑牛上青山8阅读 1.9k评论 2

Mysql索引覆盖
通常情况下,我们创建索引的时候只关注where条件,不过这只是索引优化的一个方向。优秀的索引设计应该纵观整个查询,而不仅仅是where条件部分,还应该关注查询所包含的列。索引确实是一种高效的查找数据方式,但...

京东云开发者2阅读 746

封面图
MySQL 数据库索引技术原理初探
一本书 500 页的书,如果没有目录,直接去找某个知识点,可能需要找一会儿,但是借助前面的目录,就可以快速找到对应知识点在书的哪一页。这里的目录就是索引。

mylxsw1阅读 1.3k

Mybatis源码解析之执行SQL语句
通过调用 session.getMapper (AccountMapper.class) 所得到的 AccountMapper 是一个动态代理对象,所以执行 accountMapper.selectByPrimaryKey (1) 方法前,都会被 invoke () 拦截,先执行 invoke () 中的逻辑。

京东云开发者3阅读 813评论 1

封面图
2023最新MySQL高频面试题汇总
本文已经收录到Github仓库,该仓库包含计算机基础、Java基础、多线程、JVM、数据库、Redis、Spring、Mybatis、SpringMVC、SpringBoot、分布式、微服务、设计模式、架构、校招社招分享等核心知识点,欢迎star~

程序员大彬3阅读 820

第十六届中国大数据技术大会五大分论坛顺利举办!
1 月 8 日下午,由苏州市人民政府指导、中国计算机学会主办、苏州市吴江区人民政府支持,CCF 大数据专家委员会、苏州市吴江区工信局、吴江区东太湖度假区管委会、苏州市吴江区科技局、苏州大学未来科学与工程学院...

MissD阅读 5.8k

身在德国的开源数据库女纸~

35 声望
58 粉丝
宣传栏