Introduction: Native MQTT Session Persistence Support

The MQTT protocol standard stipulates that the broker must store the messages of the offline client. In the previous version, the EMQX open source version used memory-based session storage, and the enterprise version further provided an external database storage solution on this basis to achieve data persistence.

Although this memory-based, non-persistent session storage method is an optimal solution based on the mutual trade-off between throughput and latency, it still brings certain restrictions to users in some scenarios.

In line with the concept of paying attention to community feedback and continuously improving products that are easier to use for users, we have added native MQTT session persistence support based on RocksDB in the product planning of EMQX 5.x. At present, this function has entered the official development stage, and it is expected to meet users in version 5.1.0.

This article is a sneak peek at this feature. Through the introduction of the related concepts of MQTT session and the design principle of EMQX session persistence function, it helps readers understand this more reliable and low-latency data persistence scheme. At the same time, we will also explore more new features based on RocksDB's persistence capabilities.

Understanding MQTT sessions

In the protocol specification, QoS 1 and QoS 2 messages are first stored at the client and the broker, and will not be deleted until the final confirmation arrives at the subscriber. This process requires the broker to associate the state with the client, which is called session state . In addition to the message store, subscription information (a list of topics to which the client is subscribed) is also part of the session state.

For more information about QoS, please refer to MQTT QoS (Quality of Service) introduction .

Session state in the client includes:

  • QoS 1 and QoS 2 messages sent to the server but not yet fully acknowledged
  • QoS 2 messages received from the server but not yet fully acknowledged

Session state in the server includes:

  • The existence state of the session, even if the session is empty
  • Customer Subscription Information
  • QoS 1 and QoS 2 messages sent to the client but not yet fully acknowledged
  • Wait for QoS 0 (optional), QoS 1, and QoS 2 messages to be delivered to the client
  • QoS 2 messages received from the client but not yet fully acknowledged, Will Message and Will Delay Interval

Session Lifecycle and Session Storage

Session is the key to MQTT protocol communication. MQTT protocol requires that the session state must be preserved when the network connection is opened; when the network connection is closed, according to the settings of Clean Session (MQTT 3.1.1) and Clean Start + session expiration interval (MQTT 5.0) Controls the actual drop timing.

This article will not repeat the differences in the session life cycle under different mechanisms. For related content, please refer to the articles Clean Start and Session Expiry Interval - New Features of MQTT 5.0 .

All in all, when there is a session in the broker, messages will continue to enter the session, and when the client corresponding to the session disconnects or does not have the ability to process messages, messages will accumulate in the session.

The MQTT protocol does not specify the implementation of session persistence, which means that the client and the broker can choose to store it in memory or disk according to the scenario requirements and their own design.

EMQX session persistence design in previous versions

In previous versions, EMQX did not support broker internal message persistence, which is a trade-off between throughput and latency and an architectural design choice:

  1. The core problem solved by EMQX is connection and routing. In rare cases, messages need to be stored persistently, and reserved messages are supported as a special case to be stored on disk.
  2. As a cloud service, EMQX is stable and reliable enough in such an environment, and even if the messages are all in memory, there is not much risk of loss.
  3. The built-in persistence design needs to balance the use of memory and disk in high-throughput scenarios, and the design of data storage and replication under the multi-server distributed cluster architecture. It is difficult to ensure that the persistence design is in place in one step in a rapidly developing project.

Although storing all messages in memory is beneficial from a performance point of view, memory-based session storage inevitably brings some problems: a large number of connections and possible accumulation of session messages will lead to higher Memory usage, which will restrict users from using the persistent session function on a large scale (Clean Session = 0); at the same time, session data may be lost when EMQX is restarted or EMQX is unexpectedly down, which will bring certain changes to data reliability. influences.

With the large-scale application of SSD disks in the server market, the gap between the two solutions of memory and disk can actually be very small. In addition, the thriving development of LevelDB and RocksDB infrastructure and mature use in Erlang also laid the foundation for the implementation of native session persistence support.

Since EMQX 5.0 has officially opened the era of billion-level IoT connections, both in terms of function and performance, it has been planned and designed to match the latest needs of the industry, and a new session persistence capability support design scheme has been put on the agenda.

Why RocksDB: New Session Layer Selection

Combined with the data features accessed by EMQX, we finally chose RocksDB as the new persistence layer after comparing various storage engines.

Introduction to RocksDB

RocksDB is an embedded, persistent key-value storage engine. It is optimized for fast, low-latency storage with high write throughput. RocksDB supports write-ahead log, range scan and prefix search, which can provide consistency guarantee in high concurrent read and write and large-capacity storage.

Selection basis

In the EMQX session layer design , the session is stored in the local node, we tend to store data in EMQX instead of using EMQX as a front end of the external database, so the selection scope is limited to the embedded database. In addition to RocksDB, we mainly looked at the following databases:

  • Mnesia: Mnesia is a distributed real-time database system that comes with Erlang/OTP. In a Mnesia cluster, all nodes are equal. Each of them can store a copy of the data and can also initiate transactions or perform read and write operations. Mnesia can support extremely high read throughput due to its replication feature, but this feature also limits its write throughput because it means that MQTT messages are basically broadcast within the cluster, and broadcasting does not scale horizontally.

  • LevelDB: RocksDB is an improved fork of LevelDB, they are mostly equivalent in terms of functionality, but LevelDB lacks an actively maintained driver in Erlang (Erlang NIF) so it was not adopted.

In contrast, the advantages of RocksDB are very obvious:

  • Extremely high write throughput: RocksDB is based on the LSM-Tree structure optimized for data writing, which can support EMQX mass message throughput and high-frequency data writing during fast subscription
  • Iterator and fast range query: RocksDB supports iterating over sorted keys, EMQX can expand more functions based on this feature
  • Erlang support: The NIF library for RocksDB is mature and actively supported

In the preliminary test of the RocksDB session persistence scheme, the performance advantages of RocksDB were fully utilized. Compared with memory storage, the same release rate could be achieved before other modules reached the bottleneck.

EMQX's session persistence design based on RocksDB

RocksDB will replace all modules under the current apps/emqx/src/persistent_session directory to use RocksDB to store MQTT session data.

EMQX allows all clients or uses QoS, topic prefix and other filters to configure clients and topics that need persistence enabled. In scenarios where disk performance is insufficient or message loss is acceptable and extreme performance is required, users are allowed to turn off persistence and use an in-memory storage solution.

data distribution

As an embedded database, RocksDB does not have the ability to distribute data within the cluster. In operations that require data transfer between nodes, such as a session moving from one node to another, it will be handled through EMQX's message distribution mechanism.

We combine Mnesia's replication feature with RocksDB's persistence feature. Sessions can be stored in RocksDB, but using Mnesia's API, RocksDB is just a backend of Mnesia.

What data can be persisted through RocksDB

  1. Session records for clients connected with Clean Start = 0
  2. Subscription data (Subscriptions), written to RocksDB when subscribing, deleted from RocksDB when unsubscribing
  3. Every time the client publishes a message of QoS 1 and QoS 2, the data will be written to RocksDB and retained until it is deleted after confirmation.
  4. As storage for other high-throughput and low-latency scenarios, such as reserved messages, data bridge cache queues

Persistence capability extension

The introduction of RocksDB provides EMQX with a high-performance and reliable persistence layer, on which EMQX can expand more functions.

message replay

In some scenarios, the publisher does not need to care whether the subscriber is online, but requires that the message must reach the subscriber, even if the subscriber is not online or even the session does not exist.

With the support of the persistence layer, EMQX can extend the implementation of the MQTT protocol to support the message replay function similar to Kafka: when a message is published, special flags can be set to persist in the publication target topic. When the subscriber carries non-standard subscription attributes, Allows getting messages after the specified position in the topic.

Message replay can be used in scenarios such as device initialization and OTA upgrade that do not care about the timeliness of instructions, and transmit data more flexibly between publishers and subscribers.

Typical flow of message replay

  1. The publisher publishes a persistent message
  2. EMQX stores the message in the replay queue without caring whether the subscriber is online or not
  3. Subscriber initiates subscription
  4. EMQX reads the message from the specified location
  5. Replay messages published to subscribers

data bridge cache queue

The persistence layer is used for the cache queue of the data bridge. When the bridge resource is unavailable, the data can be stored in the cache queue, and the transmission can be continued after the resource is recovered, so as to avoid the accumulation of a large amount of data in the memory.

Epilogue

The native MQTT session persistence implemented based on RocksDB is a breakthrough and important functional change since the release of EMQX. This capability will provide open source users with more reliable business assurance, and can fully utilize the MQTT protocol features for Internet of Things without restrictions. application development. Enterprise users who use external data storage can migrate to RocksDB to obtain a lower-latency data persistence solution.

At the same time, combined with the actual use scenarios of the Internet of Things, EMQX will also expand more functional support around the persistence capability to meet the increasingly diverse data requirements of the Internet of Things.

Copyright statement: This article is original by EMQ, please indicate the source when reprinting.

Original link: https://www.emqx.com/zh/blog/mqtt-persistence-based-on-rocksdb


EMQX
336 声望438 粉丝

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服务器和流处理数据库,提供基于云原生+边缘计算技术的一站式解决方案,实现企业云边端实时数据连接、移动、...