HStreamDB v0.6, a distributed cloud native streaming database open sourced by EMQ, is now officially released!
HStreamDB is the first cloud-native streaming database designed for streaming data, dedicated to the efficient storage and management of large-scale data streams. It not only supports complex real-time analysis on dynamically changing data streams, but also supports one-stop management of large-scale data stream access, storage, processing, and distribution, and real-time streaming in IoT, Internet, finance and other fields in the future Data analysis and processing scenarios will play an important role.
In the new v0.6 version, we have enabled the cluster mode for HServer, which can elastically expand the computing layer nodes according to client requests and the scale of computing tasks. At the same time, the new shared subscription function allows multiple clients to consume in parallel on the same subscription, which greatly improves the ability to distribute real-time data.
The latest version download : 161a70f6c73af3 Docker Hub
Quick overview of the new version
Supports cluster mode, the horizontal scalability of HServer is improved
HStreamDB v0.6 officially supports the cluster mode of HServer. After the cluster mode is implemented, HServer can quickly scale horizontally, support node health detection and failure recovery, and improve the fault tolerance and scalability of HStreamDB. At the same time, HServer supports load balancing. By monitoring the real-time load status of all nodes in the cluster, the calculation tasks are reasonably allocated to different nodes, and the efficient use of cluster resources is realized.
For the startup and deployment of the cluster, you can refer to the following documents:
- https://hstream.io/docs/en/latest/start/start-hserver-cluster.html#start-local-hserver-cluster-with-docker
- https://hstream.io/docs/en/latest/deployment/deploy-k8s.html
Support for shared subscriptions, enhanced real-time data distribution
In HStreamDB v0.6, we refactored the previous subscription model and introduced a new shared subscription function.
In the previous version, a subscription can only be consumed by one client at a time, which limits HStreamDB's ability to distribute data in real time. The new shared subscription function introduces the concept of Consumer Group, which manages the consumption of streams in a unified manner. All consumers of a stream will join the same consumer group, and the client can join or exit the current consumer group at any time.
HStreamDB currently supports at least once consumption semantics. HServer will distribute data to consumers in the consumer group through round-robin. All messages that have not received the client's Ack reply will be automatically resent by the HServer to available consumers after the timeout. All members in the same consumer group share the consumption progress, and HServer is responsible for maintaining the consumption progress of the consumer group. The high fault tolerance of HStreamDB ensures that the collapse of any node will not affect the consumption of stream.
At the same time, HSteamDB's Java client has also been updated to version v0.6, which fully supports HStreamDB's cluster and shared subscription functions. The new Java client refactors the API of the subscription part and enhances the ease of use of the client. For the use of HStreamDB Java Client, please refer to hstreamdb-java/examples at main · hstreamdb/hstreamdb-java
Added HStream Metrics, enhanced system observability
In HStreamDB v0.6, basic indicator statistics functions have been added, such as stream write rate and consumption rate.
Users can view these indicators in HStream CLI as follows:
-- Find the top 5 streams that have had the highest throughput in the last 1 minutes.
sql>
SELECT streams.name, sum(append_throughput.throughput_1min) AS total_throughput
FROM append_throughput
LEFT JOIN streams ON streams.name = append_throughput.stream_name
GROUP BY stream_name
ORDER BY total_throughput DESC
LIMIT 0, 5;
The query result is shown in the figure below:
New data is written into the Rest API, more possibilities based on HStreamDB
Now you can use any language to write data to HStreamDB through Rest API. In the future, we plan to open more Rest APIs to facilitate the secondary development of open source users around HStreamDB. For example, through HStream Rest API combined with the Webhook function of EMQ X open source version, you can Realize the rapid integration of EMQ X and HStreamDB.
development plan
In subsequent versions of HStreamDB, we will continue to iterate mainly around the following goals:
- Improve the stability of the cluster: add more integration tests, error injection tests, improve code design and fix bugs
- Improve usability and operation and maintenance capabilities: improve CLI tools, configuration, Rest API, Java Client
- Increase the scalability of streams: Currently HStreamDB can efficiently support concurrent reads and writes of a large number of streams, but when a single stream becomes a hot spot, it will face performance bottlenecks. In the future, we plan transparent partition . The core principle is to do our best. Maintaining the simplicity of the user-level concept and encapsulating the complexity of partitions in internal implementations will greatly enhance the user experience compared to other existing solutions.
HStreamDB is a pioneering attempt for data infrastructure to move towards the real-time data era. With the continuous advancement of research and development iterations, it is believed that in the future, large-scale streaming data continuously generated from multiple data sources will receive more efficient storage management and real-time analysis through HStreamDB, and the process of obtaining insights from data and generating value will be greatly accelerated. Stay tuned for the follow-up progress of HStreamDB.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。