In this article, we will introduce some improvements in scalability of the MQTT Broker cluster. We will mainly focus on the database engine used inside EMQ X and how it has been improved in the EMQ X 5.0 version.

Before starting this article, we need to understand how data is replicated in the EMQ X cluster: EMQ X broker stores the topic and client runtime information in the Mnesia database, which helps to replicate data across clusters.

Introduction to Mnesia

Mnesia is an open source database management system developed by Ericsson as part of the Open Telecom Platform . It was originally used to process configuration and runtime data in ISP-level telecom switches. Versions before EMQ X 4.3 used it to store various runtime data, such as topics, routes, ACL rules, alarms, and so on.

You should be very familiar with databases such as MySQL, Postgres, MongoDB, and memory storage such as Redis and memcached, but you may not know much about Mnesia. But it does have its unique advantages, which can integrate many functions of the above products into a simple application.

Mnesia has a fairly academic definition: an embedded, distributed, transactional noSQL (non-relational) database. It sounds complicated, we will explain to you one by one next.

Embedded

The most widely used databases such as MySQL and Postgres generally adopt a client-server model: the database runs in a separate process (usually on a dedicated server), and business applications send requests through the network or UNIX domain sockets and wait for responses. This way to interact with the database. This model is convenient in many ways because it allows business logic and storage to be separated and managed separately. But there are also some disadvantages: interacting with remote processes will inevitably increase the delay of each request.

In contrast, embedded databases and business applications run in the same process. sqlite is an example of a typical embedded database. Mnesia also falls into this category: it runs in the same process as other EMQ X applications. Reading data from Mnesia tables can be as fast as reading local variables, so we can read database data in hotspots without affecting performance.

distributed

We mentioned earlier that Mnesia is a distributed database, which means that data tables are copied to different physical locations on the network. For distributed databases, if nodes do not share any physical resources (such as RAM or disk), but coordinate at the application level, this type is called shared nothing architecture (SN). This type is usually preferred because it does not require any specialized hardware and can be scaled horizontally.

The Mnesia application runs with EMQ X and helps to replicate table updates across all nodes in the cluster through the Erlang distribution protocol. This means that business applications can read updated data locally. It also helps to improve fault tolerance: as long as one node in the cluster is active, the data is safe. EMQ X relies on this function to replicate routing information across clusters.

Transactional

Mnesia supports ACID transactions, which is a very unique feature of the embedded database. This means that multiple read and update operations can be combined. A Mnesia transaction has atomicity (must be complete or ineffective), consistency (although the guarantee is more relaxed than Postgres), isolation (does not affect other transactions) and durability. All these guarantees are retained throughout the cluster.

In key scenarios of data consistency, EMQ X uses Mnesia transactions.

NoSQL

Traditional relational databases use a special query language called SQL to interact with the database. This database usually uses ORM (Object Relational Mapping) to speed up development. On the other hand, Mnesia does not have a dedicated query language: it uses Erlang (or Elixir) as the query language, so no ORM is required. It directly uses Erlang terminology for query operations, and the integration with business logic is very smooth.

Architecture

In the Mnesia cluster, all nodes are equal. Each node can store a copy of any table, start a transaction, and access these tables. The Mnesia cluster uses a full mesh topology: each node talks to all other nodes in the cluster. Each transaction is replicated to all nodes in the cluster, as shown in the following figure:

Mnesia 集群

For the CAP principle (choose two from the three elements of consistency, availability, and partition fault tolerance), Mnesia defaults to AP (availability, partition fault tolerance).

challenge

In summary, the Mnesia database has a series of unique functions, and they are all used in EMQ X. Now, we want to talk about its shortcomings and the reasons why we improved it.

Although Mnesia has nothing to do with hardware, its initial development considered a specific cluster architecture: a group of servers interconnected through a fast, low-latency local area network.

Under ideal conditions, the mesh topology can reduce transaction replication latency: all communications between nodes can be completed in parallel without any intermediary. However, it limits the horizontal scalability of the cluster because there is a square relationship between the number of links between nodes and the number of nodes. As the number of nodes increases, the cost of keeping all nodes fully synchronized becomes higher and higher, and the performance of transactions will also decrease.

The superposition of the same nature of nodes and the traditional cluster paradigm makes it easy to replace a single node, but the number of nodes that can join the cluster at the same time is limited.

So we are faced with a situation: the cluster is deployed in a geographically redundant cloud environment, everything is dynamic and temporary, and the nodes are running in an automatic expansion group, and we hope that they will always be in a fluctuating state.

To meet these challenges, we have extended Mnesia and called it Mria.

Countermeasure: Introduction of Mria

Mria is an open source extended version of Mnesia, which brings ultimate consistency to Mnesia.

Mria has transformed from a full mesh topology to a mesh + star topology. Each node assumes one of two roles: core or replicant.

Core nodes behave like regular Mnesia nodes: they are connected in a full mesh, and each node can initiate write transactions, hold locks, etc. The core nodes are largely static and persistent.

On the other hand, replicated nodes do not participate in transactions. They connect to a certain core node and passively replicate transactions from it. This means that the replication node is not allowed to perform any write operations on its own. Instead, they require core nodes to update data on their behalf. At the same time, they have a complete local copy of the data, so read access is equally fast.

Mria 集群

You can think of Mria as a combination of client-server and embedded database: write through the server, but read locally.

This cluster topology solves two problems:

  • Horizontal scalability
  • Support cluster automatic expansion

Since replication nodes are not involved in writing, transaction latency will not be affected when more replication nodes are added to the cluster, allowing the creation of larger EMQ X clusters.

In addition, the replication node is designed to be temporary. Adding or removing them will not change the data redundancy, so they can be placed in the auto-scaling group to achieve better DevOps practices.

In the next article, we will discuss in more detail how to configure EMQ X to take full advantage of Mria.

Other articles in this series


Copyright statement: This article is EMQ original, please indicate the source for reprinting.

Original link: https://www.emqx.com/zh/blog/mqtt-broker-clustering-part-3-challenges-and-solutions-of-emqx-horizontal-scalability

Technical support: If you have any questions about this article or EMQ related products, you can visit the EMQ Q&A community https://askemq.com ask questions, and we will reply to support in time.

For more technical dry goods, please pay attention to our public account [EMQ Chinese Community].


EMQX
336 声望436 粉丝

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服务器和流处理数据库,提供基于云原生+边缘计算技术的一站式解决方案,实现企业云边端实时数据连接、移动、...