Spring is coming, and we are very happy to announce to you that the latest version v0.7 of the cloud-native distributed stream database HStreamDB has been officially released!

HStreamDB is the first cloud-native streaming database specially designed for streaming data. It is committed to providing one-stop management for large-scale data stream access, storage, processing, distribution, etc., and supports complex real-time operations on dynamically changing data streams. It will play an important role in real-time streaming data analysis and processing scenarios in the fields of IoT, Internet, and finance.

Major optimizations in v0.7 include greater stability, scalability, and availability. In this version, we not only found and fixed a large number of problems through integration tests, jepsen tests, etc., and improved the stability of the system, but also brought several new features and improvements, including: transparent partition function, new operation and maintenance management Tools, new version of hstreamdb-java, cluster load balancing algorithm refactoring, and improvements in usage and deployment.

GitHub project address: https://github.com/hstreamdb/hstream

A quick look at the new version

Added transparent partition function to improve the scalability of Stream

In previous versions, HStreamDB has been able to support the storage and management of large-scale data streams (Streams). In order to further improve the scalability and read/write performance of a single Stream, and to ensure the sequentiality of data, HStreamDB v0.7 adds transparent partitioning. Function:

  • From the perspective of scalability, now a stream may contain multiple partitions (the number of partitions is dynamically changed), and read and write traffic will be load balanced in the cluster through the internal partitions, thus achieving higher throughput of a single stream .
  • From a sequential point of view, each piece of written data will carry an orderingKey specified by the user, each orderingKey conceptually corresponds to a logical partition, and the data in the same logical partition will be delivered to the same logical partition in the order of writing Consumer client, as shown in the following figure.

HStreamDB

It is worth noting that in HStreamDB v0.7, partitioning is completely transparent to users. Users do not need to specify the number of partitions and partition logic in advance, nor do they need to worry about data redistribution and data disorder caused by the increase and decrease of partitions. question. Although from the perspective of system implementation, partitioning is an effective means to solve single-point bottlenecks and improve the horizontal expansion capability of the system; but from the perspective of users, directly exposing partitions to users not only destroys the abstraction of the upper layer, but also greatly increases the It reduces the user's learning, use and maintenance costs. Transparent partitioning achieves scalability and guarantees sequentiality without exposing additional complexity to users, which will greatly improve user experience.

For a more detailed introduction to transparent partitioning, please refer to: HStreamDB Docs

Improve cluster load balancing algorithm to improve allocation efficiency

In order to make reasonable use of the resources of each node in the cluster, it is necessary to distribute the read and write traffic of the client to each node in the cluster as evenly as possible. The load balancing strategy of HStreamDB v0.6 is implemented based on the hardware resource load of nodes. The main problem is that nodes need to communicate with each other to exchange various hardware resource information, including CPU, memory, network, etc. At the same time, this method exists With a certain hysteresis, the overall implementation is relatively complicated and the efficiency is low.

To this end, in HSteamDB v0.7 we reimplemented a new load balancing module based on the consistent hashing algorithm. Consistent hashing is an elegant and powerful algorithm used by various distributed systems, such as DynamoDB. The allocation strategy based on it not only makes the load balancing module no longer need to maintain hardware resource information in real time, but also the core algorithm is more concise, and it can also deal with the problem of redistribution when cluster members change. At the same time, it is also very flexible and can be easily extended and optimized, such as dealing with heterogeneous nodes by configuring different weights. There are also some recent research optimizations like Google's Consistent Hashing with Bounded Loads.

Added HStream Admin tool to facilitate operation and maintenance management

We provide a new management tool to facilitate users to maintain and manage HStreamDB. HAdmin can be used to monitor and manage various resources of HStreamDB, including Stream, Subscription and Server nodes. HStream Metrics, previously embedded in HStream SQL Shell, have also been migrated to the new HAdmin. In short, HAdmin is for HStreamDB operators, and SQL Shell is for HStreamDB end users.

Example:

docker run -it --rm --name some-hstream-admin --network host hstreamdb/hstream:v0.7.0 bash
> hadmin --help
======= HStream Admin CLI =======

Usage: hadmin COMMAND

Available options:
  -h,--help                Show this help text

Available commands:
  server                   Admin command
  store                    Internal store admin command
> hadmin server status 
+---------+---------+-------------------+
| node_id |  state  |      address      |
+---------+---------+-------------------+
| 100     | Running | 192.168.64.4:6570 |
| 101     | Running | 192.168.64.5:6572 |
+---------+---------+-------------------+

For detailed usage, please refer to: HStreamDB Docs

hstreamdb-java v0.7 released, support HStreamDB v0.7 new features

hstreamdb-java is the current main HstreamDB client and will always support the latest features of HSteamDB. The new functions of HStreamDB v0.7 are also supported in hstreamdb-java v0.7. Specifically, compared with hstreamdb-java v0.6, in addition to the fixes of several problems, hstreamdb-java v0.7 mainly includes the following values New features and improvements to watch:

  • Added support for HStreamDB v0.7 transparent partitioning feature.
  • Improved support for clusters, adding the ability for requests to be retried across multiple nodes in the cluster in the case of recoverable failures.
  • Added BufferedProducer interface and implementation. Considering that users have different requirements for write latency and throughput in different scenarios, for the sake of clarity, we split the original Producer into two independent BufferedProducer and Producer , of which BufferedProducer is mainly for high-throughput scenarios. Producer is mainly used for low latency scenarios.
  • BufferedProducer Added two new flush modes. The original Producer only supports triggering flush according to the number of data bars in batch mode. Now, BufferedProducer added two flush modes: size-triggered and time-triggered. At the same time, these three types of trigger conditions can work at the same time, which can more flexibly satisfy users usage requirements.

hstreamdb-java GitHub repository: https://github.com/hstreamdb/hstreamdb-java

Simplify the deployment and use process and improve the user experience

future plan

In the next development work, we will focus on the following goals:

  • Continuously improve the stability of the system to achieve production availability
  • Continuously improve system availability and O&M monitoring capabilities, and enhance security support
  • Upgrade the existing stream processing engine to bring more powerful real-time processing and analysis capabilities

Stay tuned!


EMQX
336 声望436 粉丝

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服务器和流处理数据库,提供基于云原生+边缘计算技术的一站式解决方案,实现企业云边端实时数据连接、移动、...