HStreamDB v0.6 is officially released: horizontal scalability and real-time data distribution are improved

HStreamDB v0.6, a distributed cloud native streaming database open sourced by EMQ, is now officially released!

HStreamDB is the first cloud-native streaming database designed for streaming data, dedicated to the efficient storage and management of large-scale data streams. It not only supports complex real-time analysis on dynamically changing data streams, but also supports one-stop management of large-scale data stream access, storage, processing, and distribution, and real-time streaming in IoT, Internet, finance and other fields in the future Data analysis and processing scenarios will play an important role.

In the new v0.6 version, we have enabled the cluster mode for HServer, which can elastically expand the computing layer nodes according to client requests and the scale of computing tasks. At the same time, the new shared subscription function allows multiple clients to consume in parallel on the same subscription, which greatly improves the ability to distribute real-time data.

The latest version download : 161a70f6c73af3 Docker Hub

Quick overview of the new version

Supports cluster mode, the horizontal scalability of HServer is improved

HStreamDB v0.6 officially supports the cluster mode of HServer. After the cluster mode is implemented, HServer can quickly scale horizontally, support node health detection and failure recovery, and improve the fault tolerance and scalability of HStreamDB. At the same time, HServer supports load balancing. By monitoring the real-time load status of all nodes in the cluster, the calculation tasks are reasonably allocated to different nodes, and the efficient use of cluster resources is realized.

For the startup and deployment of the cluster, you can refer to the following documents:

HStream

Support for shared subscriptions, enhanced real-time data distribution

In HStreamDB v0.6, we refactored the previous subscription model and introduced a new shared subscription function.

In the previous version, a subscription can only be consumed by one client at a time, which limits HStreamDB's ability to distribute data in real time. The new shared subscription function introduces the concept of Consumer Group, which manages the consumption of streams in a unified manner. All consumers of a stream will join the same consumer group, and the client can join or exit the current consumer group at any time.

HStreamDB currently supports at least once consumption semantics. HServer will distribute data to consumers in the consumer group through round-robin. All messages that have not received the client's Ack reply will be automatically resent by the HServer to available consumers after the timeout. All members in the same consumer group share the consumption progress, and HServer is responsible for maintaining the consumption progress of the consumer group. The high fault tolerance of HStreamDB ensures that the collapse of any node will not affect the consumption of stream.

At the same time, HSteamDB's Java client has also been updated to version v0.6, which fully supports HStreamDB's cluster and shared subscription functions. The new Java client refactors the API of the subscription part and enhances the ease of use of the client. For the use of HStreamDB Java Client, please refer to hstreamdb-java/examples at main · hstreamdb/hstreamdb-java

Added HStream Metrics, enhanced system observability

In HStreamDB v0.6, basic indicator statistics functions have been added, such as stream write rate and consumption rate.

Users can view these indicators in HStream CLI as follows:

-- Find the top 5 streams that have had the highest throughput in the last 1 minutes. 
sql>  
SELECT streams.name, sum(append_throughput.throughput_1min) AS total_throughput 
FROM append_throughput 
LEFT JOIN streams ON streams.name = append_throughput.stream_name  
GROUP BY stream_name 
ORDER BY total_throughput DESC 
LIMIT 0, 5;

The query result is shown in the figure below:

查询结果

New data is written into the Rest API, more possibilities based on HStreamDB

Now you can use any language to write data to HStreamDB through Rest API. In the future, we plan to open more Rest APIs to facilitate the secondary development of open source users around HStreamDB. For example, through HStream Rest API combined with the Webhook function of EMQ X open source version, you can Realize the rapid integration of EMQ X and HStreamDB.

HStreamDB Rest API

development plan

In subsequent versions of HStreamDB, we will continue to iterate mainly around the following goals:

Improve the stability of the cluster: add more integration tests, error injection tests, improve code design and fix bugs
Improve usability and operation and maintenance capabilities: improve CLI tools, configuration, Rest API, Java Client
Increase the scalability of streams: Currently HStreamDB can efficiently support concurrent reads and writes of a large number of streams, but when a single stream becomes a hot spot, it will face performance bottlenecks. In the future, we plan transparent partition . The core principle is to do our best. Maintaining the simplicity of the user-level concept and encapsulating the complexity of partitions in internal implementations will greatly enhance the user experience compared to other existing solutions.

HStreamDB is a pioneering attempt for data infrastructure to move towards the real-time data era. With the continuous advancement of research and development iterations, it is believed that in the future, large-scale streaming data continuously generated from multiple data sources will receive more efficient storage management and real-time analysis through HStreamDB, and the process of obtaining insights from data and generating value will be greatly accelerated. Stay tuned for the follow-up progress of HStreamDB.

HStreamDB v0.6 is officially released: horizontal scalability and real-time data distribution are improved

Quick overview of the new version

Supports cluster mode, the horizontal scalability of HServer is improved

Support for shared subscriptions, enhanced real-time data distribution

Added HStream Metrics, enhanced system observability

New data is written into the Rest API, more possibilities based on HStreamDB

development plan

EMQX

引用和评论

在 Windows 平台搭建 MQTT 服务

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式