Challenges and countermeasures related to the horizontal scalability of EMQ X-MQTT Broker cluster detailed explanation (3)

In this article, we will introduce some improvements in scalability of the MQTT Broker cluster. We will mainly focus on the database engine used inside EMQ X and how it has been improved in the EMQ X 5.0 version.

Before starting this article, we need to understand how data is replicated in the EMQ X cluster: EMQ X broker stores the topic and client runtime information in the Mnesia database, which helps to replicate data across clusters.

Introduction to Mnesia

Mnesia is an open source database management system developed by Ericsson as part of the Open Telecom Platform . It was originally used to process configuration and runtime data in ISP-level telecom switches. Versions before EMQ X 4.3 used it to store various runtime data, such as topics, routes, ACL rules, alarms, and so on.

You should be very familiar with databases such as MySQL, Postgres, MongoDB, and memory storage such as Redis and memcached, but you may not know much about Mnesia. But it does have its unique advantages, which can integrate many functions of the above products into a simple application.

Mnesia has a fairly academic definition: an embedded, distributed, transactional noSQL (non-relational) database. It sounds complicated, we will explain to you one by one next.

Embedded

The most widely used databases such as MySQL and Postgres generally adopt a client-server model: the database runs in a separate process (usually on a dedicated server), and business applications send requests through the network or UNIX domain sockets and wait for responses. This way to interact with the database. This model is convenient in many ways because it allows business logic and storage to be separated and managed separately. But there are also some disadvantages: interacting with remote processes will inevitably increase the delay of each request.

In contrast, embedded databases and business applications run in the same process. sqlite is an example of a typical embedded database. Mnesia also falls into this category: it runs in the same process as other EMQ X applications. Reading data from Mnesia tables can be as fast as reading local variables, so we can read database data in hotspots without affecting performance.

distributed

We mentioned earlier that Mnesia is a distributed database, which means that data tables are copied to different physical locations on the network. For distributed databases, if nodes do not share any physical resources (such as RAM or disk), but coordinate at the application level, this type is called shared nothing architecture (SN). This type is usually preferred because it does not require any specialized hardware and can be scaled horizontally.

The Mnesia application runs with EMQ X and helps to replicate table updates across all nodes in the cluster through the Erlang distribution protocol. This means that business applications can read updated data locally. It also helps to improve fault tolerance: as long as one node in the cluster is active, the data is safe. EMQ X relies on this function to replicate routing information across clusters.

Transactional

Mnesia supports ACID transactions, which is a very unique feature of the embedded database. This means that multiple read and update operations can be combined. A Mnesia transaction has atomicity (must be complete or ineffective), consistency (although the guarantee is more relaxed than Postgres), isolation (does not affect other transactions) and durability. All these guarantees are retained throughout the cluster.

In key scenarios of data consistency, EMQ X uses Mnesia transactions.

NoSQL

Traditional relational databases use a special query language called SQL to interact with the database. This database usually uses ORM (Object Relational Mapping) to speed up development. On the other hand, Mnesia does not have a dedicated query language: it uses Erlang (or Elixir) as the query language, so no ORM is required. It directly uses Erlang terminology for query operations, and the integration with business logic is very smooth.

Architecture

In the Mnesia cluster, all nodes are equal. Each node can store a copy of any table, start a transaction, and access these tables. The Mnesia cluster uses a full mesh topology: each node talks to all other nodes in the cluster. Each transaction is replicated to all nodes in the cluster, as shown in the following figure:

Mnesia 集群

For the CAP principle (choose two from the three elements of consistency, availability, and partition fault tolerance), Mnesia defaults to AP (availability, partition fault tolerance).

challenge

In summary, the Mnesia database has a series of unique functions, and they are all used in EMQ X. Now, we want to talk about its shortcomings and the reasons why we improved it.

Although Mnesia has nothing to do with hardware, its initial development considered a specific cluster architecture: a group of servers interconnected through a fast, low-latency local area network.

Under ideal conditions, the mesh topology can reduce transaction replication latency: all communications between nodes can be completed in parallel without any intermediary. However, it limits the horizontal scalability of the cluster because there is a square relationship between the number of links between nodes and the number of nodes. As the number of nodes increases, the cost of keeping all nodes fully synchronized becomes higher and higher, and the performance of transactions will also decrease.

The superposition of the same nature of nodes and the traditional cluster paradigm makes it easy to replace a single node, but the number of nodes that can join the cluster at the same time is limited.

So we are faced with a situation: the cluster is deployed in a geographically redundant cloud environment, everything is dynamic and temporary, and the nodes are running in an automatic expansion group, and we hope that they will always be in a fluctuating state.

To meet these challenges, we have extended Mnesia and called it Mria.

Countermeasure: Introduction of Mria

Mria is an open source extended version of Mnesia, which brings ultimate consistency to Mnesia.

Mria has transformed from a full mesh topology to a mesh + star topology. Each node assumes one of two roles: core or replicant.

Core nodes behave like regular Mnesia nodes: they are connected in a full mesh, and each node can initiate write transactions, hold locks, etc. The core nodes are largely static and persistent.

On the other hand, replicated nodes do not participate in transactions. They connect to a certain core node and passively replicate transactions from it. This means that the replication node is not allowed to perform any write operations on its own. Instead, they require core nodes to update data on their behalf. At the same time, they have a complete local copy of the data, so read access is equally fast.

Mria 集群

You can think of Mria as a combination of client-server and embedded database: write through the server, but read locally.

This cluster topology solves two problems:

Horizontal scalability
Support cluster automatic expansion

Since replication nodes are not involved in writing, transaction latency will not be affected when more replication nodes are added to the cluster, allowing the creation of larger EMQ X clusters.

In addition, the replication node is designed to be temporary. Adding or removing them will not change the data redundancy, so they can be placed in the auto-scaling group to achieve better DevOps practices.

In the next article, we will discuss in more detail how to configure EMQ X to take full advantage of Mria.

Challenges and countermeasures related to the horizontal scalability of EMQ X-MQTT Broker cluster detailed explanation (3)

Introduction to Mnesia

Embedded

distributed

Transactional

NoSQL

Architecture

challenge

Countermeasure: Introduction of Mria

Other articles in this series

EMQX

引用和评论

在 Windows 平台搭建 MQTT 服务

正点原子ESP32P4开发板震撼来袭，助力嵌入式AI应用开发！

AIoT 智变浪潮演讲实录 | 刘浩然：让硬件会思考：边缘大模型网关助力硬件智能革新

《ESP32-S3使用指南—IDF版 V1.6》第十二章 EXIT实验

《ESP32-S3使用指南—IDF版 V1.6》第十七章 SW_PWM实验

百度百舸万卡集群的训练稳定性系统设计和实践

《ESP32-S3使用指南—IDF版 V1.6》第十九章 IIC_EXIO实验