In this article, we will introduce some improvements in scalability of the MQTT Broker cluster. We will mainly focus on the database engine used inside EMQ X and how it has been improved in the EMQ X 5.0 version.
Before starting this article, we need to understand how data is replicated in the EMQ X cluster: EMQ X broker stores the topic and client runtime information in the Mnesia database, which helps to replicate data across clusters.
Introduction to Mnesia
Mnesia is an open source database management system developed by Ericsson as part of the Open Telecom Platform . It was originally used to process configuration and runtime data in ISP-level telecom switches. Versions before EMQ X 4.3 used it to store various runtime data, such as topics, routes, ACL rules, alarms, and so on.
You should be very familiar with databases such as MySQL, Postgres, MongoDB, and memory storage such as Redis and memcached, but you may not know much about Mnesia. But it does have its unique advantages, which can integrate many functions of the above products into a simple application.
Mnesia has a fairly academic definition: an embedded, distributed, transactional noSQL (non-relational) database. It sounds complicated, we will explain to you one by one next.
Embedded
The most widely used databases such as MySQL and Postgres generally adopt a client-server model: the database runs in a separate process (usually on a dedicated server), and business applications send requests through the network or UNIX domain sockets and wait for responses. This way to interact with the database. This model is convenient in many ways because it allows business logic and storage to be separated and managed separately. But there are also some disadvantages: interacting with remote processes will inevitably increase the delay of each request.
In contrast, embedded databases and business applications run in the same process. sqlite is an example of a typical embedded database. Mnesia also falls into this category: it runs in the same process as other EMQ X applications. Reading data from Mnesia tables can be as fast as reading local variables, so we can read database data in hotspots without affecting performance.
distributed
We mentioned earlier that Mnesia is a distributed database, which means that data tables are copied to different physical locations on the network. For distributed databases, if nodes do not share any physical resources (such as RAM or disk), but coordinate at the application level, this type is called shared nothing architecture (SN). This type is usually preferred because it does not require any specialized hardware and can be scaled horizontally.
The Mnesia application runs with EMQ X and helps to replicate table updates across all nodes in the cluster through the Erlang distribution protocol. This means that business applications can read updated data locally. It also helps to improve fault tolerance: as long as one node in the cluster is active, the data is safe. EMQ X relies on this function to replicate routing information across clusters.
Transactional
Mnesia supports ACID transactions, which is a very unique feature of the embedded database. This means that multiple read and update operations can be combined. A Mnesia transaction has atomicity (must be complete or ineffective), consistency (although the guarantee is more relaxed than Postgres), isolation (does not affect other transactions) and durability. All these guarantees are retained throughout the cluster.
In key scenarios of data consistency, EMQ X uses Mnesia transactions.
NoSQL
Traditional relational databases use a special query language called SQL to interact with the database. This database usually uses ORM (Object Relational Mapping) to speed up development. On the other hand, Mnesia does not have a dedicated query language: it uses Erlang (or Elixir) as the query language, so no ORM is required. It directly uses Erlang terminology for query operations, and the integration with business logic is very smooth.
Architecture
In the Mnesia cluster, all nodes are equal. Each node can store a copy of any table, start a transaction, and access these tables. The Mnesia cluster uses a full mesh topology: each node talks to all other nodes in the cluster. Each transaction is replicated to all nodes in the cluster, as shown in the following figure:
For the CAP principle (choose two from the three elements of consistency, availability, and partition fault tolerance), Mnesia defaults to AP (availability, partition fault tolerance).
challenge
In summary, the Mnesia database has a series of unique functions, and they are all used in EMQ X. Now, we want to talk about its shortcomings and the reasons why we improved it.
Although Mnesia has nothing to do with hardware, its initial development considered a specific cluster architecture: a group of servers interconnected through a fast, low-latency local area network.
Under ideal conditions, the mesh topology can reduce transaction replication latency: all communications between nodes can be completed in parallel without any intermediary. However, it limits the horizontal scalability of the cluster because there is a square relationship between the number of links between nodes and the number of nodes. As the number of nodes increases, the cost of keeping all nodes fully synchronized becomes higher and higher, and the performance of transactions will also decrease.
The superposition of the same nature of nodes and the traditional cluster paradigm makes it easy to replace a single node, but the number of nodes that can join the cluster at the same time is limited.
So we are faced with a situation: the cluster is deployed in a geographically redundant cloud environment, everything is dynamic and temporary, and the nodes are running in an automatic expansion group, and we hope that they will always be in a fluctuating state.
To meet these challenges, we have extended Mnesia and called it Mria.
Countermeasure: Introduction of Mria
Mria is an open source extended version of Mnesia, which brings ultimate consistency to Mnesia.
Mria has transformed from a full mesh topology to a mesh + star topology. Each node assumes one of two roles: core or replicant.
Core nodes behave like regular Mnesia nodes: they are connected in a full mesh, and each node can initiate write transactions, hold locks, etc. The core nodes are largely static and persistent.
On the other hand, replicated nodes do not participate in transactions. They connect to a certain core node and passively replicate transactions from it. This means that the replication node is not allowed to perform any write operations on its own. Instead, they require core nodes to update data on their behalf. At the same time, they have a complete local copy of the data, so read access is equally fast.
You can think of Mria as a combination of client-server and embedded database: write through the server, but read locally.
This cluster topology solves two problems:
- Horizontal scalability
- Support cluster automatic expansion
Since replication nodes are not involved in writing, transaction latency will not be affected when more replication nodes are added to the cluster, allowing the creation of larger EMQ X clusters.
In addition, the replication node is designed to be temporary. Adding or removing them will not change the data redundancy, so they can be placed in the auto-scaling group to achieve better DevOps practices.
In the next article, we will discuss in more detail how to configure EMQ X to take full advantage of Mria.
Other articles in this series
- MQTT Broker cluster detailed explanation (1): Load balancing
- MQTT Broker cluster detailed explanation (two): sticky session load balancing
Copyright statement: This article is EMQ original, please indicate the source for reprinting.
Original link: https://www.emqx.com/zh/blog/mqtt-broker-clustering-part-3-challenges-and-solutions-of-emqx-horizontal-scalability
Technical support: If you have any questions about this article or EMQ related products, you can visit the EMQ Q&A community https://askemq.com ask questions, and we will reply to support in time.
For more technical dry goods, please pay attention to our public account [EMQ Chinese Community].
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。