HStreamDB v0.7 Released: Transparent Partitioning, Hash Algorithms, and New Attempts to Improve Performance

Spring is coming, and we are very happy to announce to you that the latest version v0.7 of the cloud-native distributed stream database HStreamDB has been officially released!

HStreamDB is the first cloud-native streaming database specially designed for streaming data. It is committed to providing one-stop management for large-scale data stream access, storage, processing, distribution, etc., and supports complex real-time operations on dynamically changing data streams. It will play an important role in real-time streaming data analysis and processing scenarios in the fields of IoT, Internet, and finance.

Major optimizations in v0.7 include greater stability, scalability, and availability. In this version, we not only found and fixed a large number of problems through integration tests, jepsen tests, etc., and improved the stability of the system, but also brought several new features and improvements, including: transparent partition function, new operation and maintenance management Tools, new version of hstreamdb-java, cluster load balancing algorithm refactoring, and improvements in usage and deployment.

GitHub project address: https://github.com/hstreamdb/hstream

A quick look at the new version

Added transparent partition function to improve the scalability of Stream

In previous versions, HStreamDB has been able to support the storage and management of large-scale data streams (Streams). In order to further improve the scalability and read/write performance of a single Stream, and to ensure the sequentiality of data, HStreamDB v0.7 adds transparent partitioning. Function:

From the perspective of scalability, now a stream may contain multiple partitions (the number of partitions is dynamically changed), and read and write traffic will be load balanced in the cluster through the internal partitions, thus achieving higher throughput of a single stream .
From a sequential point of view, each piece of written data will carry an orderingKey specified by the user, each orderingKey conceptually corresponds to a logical partition, and the data in the same logical partition will be delivered to the same logical partition in the order of writing Consumer client, as shown in the following figure.

HStreamDB

It is worth noting that in HStreamDB v0.7, partitioning is completely transparent to users. Users do not need to specify the number of partitions and partition logic in advance, nor do they need to worry about data redistribution and data disorder caused by the increase and decrease of partitions. question. Although from the perspective of system implementation, partitioning is an effective means to solve single-point bottlenecks and improve the horizontal expansion capability of the system; but from the perspective of users, directly exposing partitions to users not only destroys the abstraction of the upper layer, but also greatly increases the It reduces the user's learning, use and maintenance costs. Transparent partitioning achieves scalability and guarantees sequentiality without exposing additional complexity to users, which will greatly improve user experience.

For a more detailed introduction to transparent partitioning, please refer to: HStreamDB Docs

Improve cluster load balancing algorithm to improve allocation efficiency

In order to make reasonable use of the resources of each node in the cluster, it is necessary to distribute the read and write traffic of the client to each node in the cluster as evenly as possible. The load balancing strategy of HStreamDB v0.6 is implemented based on the hardware resource load of nodes. The main problem is that nodes need to communicate with each other to exchange various hardware resource information, including CPU, memory, network, etc. At the same time, this method exists With a certain hysteresis, the overall implementation is relatively complicated and the efficiency is low.

To this end, in HSteamDB v0.7 we reimplemented a new load balancing module based on the consistent hashing algorithm. Consistent hashing is an elegant and powerful algorithm used by various distributed systems, such as DynamoDB. The allocation strategy based on it not only makes the load balancing module no longer need to maintain hardware resource information in real time, but also the core algorithm is more concise, and it can also deal with the problem of redistribution when cluster members change. At the same time, it is also very flexible and can be easily extended and optimized, such as dealing with heterogeneous nodes by configuring different weights. There are also some recent research optimizations like Google's Consistent Hashing with Bounded Loads.

Added HStream Admin tool to facilitate operation and maintenance management

We provide a new management tool to facilitate users to maintain and manage HStreamDB. HAdmin can be used to monitor and manage various resources of HStreamDB, including Stream, Subscription and Server nodes. HStream Metrics, previously embedded in HStream SQL Shell, have also been migrated to the new HAdmin. In short, HAdmin is for HStreamDB operators, and SQL Shell is for HStreamDB end users.

Example:

docker run -it --rm --name some-hstream-admin --network host hstreamdb/hstream:v0.7.0 bash
> hadmin --help
======= HStream Admin CLI =======

Usage: hadmin COMMAND

Available options:
  -h,--help                Show this help text

Available commands:
  server                   Admin command
  store                    Internal store admin command
> hadmin server status 
+---------+---------+-------------------+
| node_id |  state  |      address      |
+---------+---------+-------------------+
| 100     | Running | 192.168.64.4:6570 |
| 101     | Running | 192.168.64.5:6572 |
+---------+---------+-------------------+

For detailed usage, please refer to: HStreamDB Docs

hstreamdb-java v0.7 released, support HStreamDB v0.7 new features

hstreamdb-java is the current main HstreamDB client and will always support the latest features of HSteamDB. The new functions of HStreamDB v0.7 are also supported in hstreamdb-java v0.7. Specifically, compared with hstreamdb-java v0.6, in addition to the fixes of several problems, hstreamdb-java v0.7 mainly includes the following values New features and improvements to watch:

Added support for HStreamDB v0.7 transparent partitioning feature.
Improved support for clusters, adding the ability for requests to be retried across multiple nodes in the cluster in the case of recoverable failures.
Added BufferedProducer interface and implementation. Considering that users have different requirements for write latency and throughput in different scenarios, for the sake of clarity, we split the original Producer into two independent BufferedProducer and Producer , of which BufferedProducer is mainly for high-throughput scenarios. Producer is mainly used for low latency scenarios.
BufferedProducer Added two new flush modes. The original Producer only supports triggering flush according to the number of data bars in batch mode. Now, BufferedProducer added two flush modes: size-triggered and time-triggered. At the same time, these three types of trigger conditions can work at the same time, which can more flexibly satisfy users usage requirements.

hstreamdb-java GitHub repository: https://github.com/hstreamdb/hstreamdb-java

Simplify the deployment and use process and improve the user experience

In order to facilitate users to quickly experience and use HStreamDB, we have added a quick start document based on docker-compose: https://hstream.io/docs/en/latest/start/quickstart-with-docker.html
In order to support users to quickly deploy and use HStreamDB clusters on multiple machines, we have developed a special cluster deployment script, which can be downloaded from the following link https://github.com/hstreamdb/hstream/blob/main/script/dev- deploy
With the continuous increase of HStreamDB configuration items, the original way of passing the configuration through command line options is not enough, so we have introduced the way of unified management of configuration items through configuration files, please refer to: https://hstream. io/docs/en/latest/reference/config.html#configuration-table

future plan

In the next development work, we will focus on the following goals:

Continuously improve the stability of the system to achieve production availability
Continuously improve system availability and O&M monitoring capabilities, and enhance security support
Upgrade the existing stream processing engine to bring more powerful real-time processing and analysis capabilities

Stay tuned!

HStreamDB v0.7 Released: Transparent Partitioning, Hash Algorithms, and New Attempts to Improve Performance

A quick look at the new version

Added transparent partition function to improve the scalability of Stream

Improve cluster load balancing algorithm to improve allocation efficiency

Added HStream Admin tool to facilitate operation and maintenance management

hstreamdb-java v0.7 released, support HStreamDB v0.7 new features

Simplify the deployment and use process and improve the user experience

future plan

EMQX

引用和评论

在 Windows 平台搭建 MQTT 服务

🤩必看！Cherry Studio + DeepSeek 搭建本地私有知识库！

docker 安装 php-fpm 服务 / 扩展 / 配置

再见 XShell！一款万能通用的终端工具，用完爱不释手！

🔥必看！AnythingLLM+DeepSeek 快速构建私有知识库！

Redis 持久化原理分析和使用建议

git 常用命令