HStreamDB v0.6, a distributed cloud native streaming database open sourced by EMQ, is now officially released!

HStreamDB is the first cloud-native streaming database designed for streaming data, dedicated to the efficient storage and management of large-scale data streams. It not only supports complex real-time analysis on dynamically changing data streams, but also supports one-stop management of large-scale data stream access, storage, processing, and distribution, and real-time streaming in IoT, Internet, finance and other fields in the future Data analysis and processing scenarios will play an important role.

In the new v0.6 version, we have enabled the cluster mode for HServer, which can elastically expand the computing layer nodes according to client requests and the scale of computing tasks. At the same time, the new shared subscription function allows multiple clients to consume in parallel on the same subscription, which greatly improves the ability to distribute real-time data.

The latest version download : 161a70f6c73af3 Docker Hub

Quick overview of the new version

Supports cluster mode, the horizontal scalability of HServer is improved

HStreamDB v0.6 officially supports the cluster mode of HServer. After the cluster mode is implemented, HServer can quickly scale horizontally, support node health detection and failure recovery, and improve the fault tolerance and scalability of HStreamDB. At the same time, HServer supports load balancing. By monitoring the real-time load status of all nodes in the cluster, the calculation tasks are reasonably allocated to different nodes, and the efficient use of cluster resources is realized.

For the startup and deployment of the cluster, you can refer to the following documents:

HStream

Support for shared subscriptions, enhanced real-time data distribution

In HStreamDB v0.6, we refactored the previous subscription model and introduced a new shared subscription function.

In the previous version, a subscription can only be consumed by one client at a time, which limits HStreamDB's ability to distribute data in real time. The new shared subscription function introduces the concept of Consumer Group, which manages the consumption of streams in a unified manner. All consumers of a stream will join the same consumer group, and the client can join or exit the current consumer group at any time.

HStreamDB currently supports at least once consumption semantics. HServer will distribute data to consumers in the consumer group through round-robin. All messages that have not received the client's Ack reply will be automatically resent by the HServer to available consumers after the timeout. All members in the same consumer group share the consumption progress, and HServer is responsible for maintaining the consumption progress of the consumer group. The high fault tolerance of HStreamDB ensures that the collapse of any node will not affect the consumption of stream.

At the same time, HSteamDB's Java client has also been updated to version v0.6, which fully supports HStreamDB's cluster and shared subscription functions. The new Java client refactors the API of the subscription part and enhances the ease of use of the client. For the use of HStreamDB Java Client, please refer to hstreamdb-java/examples at main · hstreamdb/hstreamdb-java

Added HStream Metrics, enhanced system observability

In HStreamDB v0.6, basic indicator statistics functions have been added, such as stream write rate and consumption rate.

Users can view these indicators in HStream CLI as follows:

-- Find the top 5 streams that have had the highest throughput in the last 1 minutes. 
sql>  
SELECT streams.name, sum(append_throughput.throughput_1min) AS total_throughput 
FROM append_throughput 
LEFT JOIN streams ON streams.name = append_throughput.stream_name  
GROUP BY stream_name 
ORDER BY total_throughput DESC 
LIMIT 0, 5;

The query result is shown in the figure below:

查询结果

New data is written into the Rest API, more possibilities based on HStreamDB

Now you can use any language to write data to HStreamDB through Rest API. In the future, we plan to open more Rest APIs to facilitate the secondary development of open source users around HStreamDB. For example, through HStream Rest API combined with the Webhook function of EMQ X open source version, you can Realize the rapid integration of EMQ X and HStreamDB.

HStreamDB Rest API

development plan

In subsequent versions of HStreamDB, we will continue to iterate mainly around the following goals:

  • Improve the stability of the cluster: add more integration tests, error injection tests, improve code design and fix bugs
  • Improve usability and operation and maintenance capabilities: improve CLI tools, configuration, Rest API, Java Client
  • Increase the scalability of streams: Currently HStreamDB can efficiently support concurrent reads and writes of a large number of streams, but when a single stream becomes a hot spot, it will face performance bottlenecks. In the future, we plan transparent partition . The core principle is to do our best. Maintaining the simplicity of the user-level concept and encapsulating the complexity of partitions in internal implementations will greatly enhance the user experience compared to other existing solutions.

HStreamDB is a pioneering attempt for data infrastructure to move towards the real-time data era. With the continuous advancement of research and development iterations, it is believed that in the future, large-scale streaming data continuously generated from multiple data sources will receive more efficient storage management and real-time analysis through HStreamDB, and the process of obtaining insights from data and generating value will be greatly accelerated. Stay tuned for the follow-up progress of HStreamDB.


EMQX
336 声望436 粉丝

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服务器和流处理数据库,提供基于云原生+边缘计算技术的一站式解决方案,实现企业云边端实时数据连接、移动、...