物联网 - HStreamDB Newsletter 2022-07｜Partition model optimization, data integration framework further improved - 个人文章

This month, the HStreamDB team is mainly preparing for the final development and release of v0.9, and has further improved and tested the stream partition model improvements, new cluster mechanism, HStream IO and other new features that v0.9 will bring. Also upgraded major clients to adapt v0.9.

Stream partition model improvements

In previous versions, HStreamDB adopted a transparent partition model, the number of partitions in each stream was dynamically adjusted according to the write load, and the partitions inside the stream were invisible to users. The advantage of this model is that it maintains the simplicity of the user concept while retaining the flexibility of implementation. It can dynamically scale the number of partitions with the load, and maintain the required data order during the scaling process.

The main disadvantage of the current model is that users cannot directly perform partition-level operations and fine-grained control, for example, they cannot directly read the data of a partition from any location. To this end, we decided to give users the operation and control capabilities of open partitions, so that users can:

Control routing of data between partitions via partitionKey
Read the data of any shard directly from the specified location
Manually control dynamic scaling of in-stream partitions

In terms of implementation, HStreamDB adopts a key-range-based partitioning mechanism. All shards under the stream jointly divide the entire key space. Each shard belongs to a continuous subspace (key range). The expansion and contraction of shards correspond to the subspace. Split and merge. At the same time, the scaling of the partition will not cause the copying and migration of old data, but will cause the closure of the parent partition. New data will automatically enter the child partition, but at the same time, the data of the parent partition is still readable. Based on this design, the dynamic scaling of partitions will be more controllable and fast, and will not bring about problems such as inefficiency and data ordering caused by redistribution of old data, which is actually the internal working mechanism of transparent partitioning.

The partition model improvements described above will be included in the upcoming v0.9 release (the ability to control partition splitting and merging is not included for now).

HStream IO update

HStream IO is an internal data integration framework that HStreamDB v0.9 will release soon, including source connectors, sink connectors, IO runtime and other components. It can realize the interconnection between HStreamDB and various external systems, thereby helping to promote data in the entire enterprise data stack. efficient circulation and real-time value release.

After we added cdc source support for multiple databases last month, this month we added sink connector support for MySQL and PostgreSQL, and also added support for embedded IO runtime in connector parameter checking, configuration document generation, and task safe exit Other aspects have been improved and enhanced, and SQL commands are also provided to facilitate users to create and manage IO tasks through CLI. Examples are as follows:

 create source connector source01 from mysql with ("host" = "127.0.0.1", "port" = 3306, "user" = "root", "password" = "password", "database" = "d1", "table" = "t1", "stream" = "stream01");

create sink connector sink01 to postgresql with ("host" = "127.0.0.1", "port" = 5432, "user" = "postgres", "password" = "postgres", "database" = "d1", "table" = "t1", "stream" = "stream01");

show connectors;

pause connector source01;

resume connecctor source01;

drop connector source01;

HStream MetaStore

Currently, HStreamDB uses Zookeeper to store metadata in the system, such as the replication attributes of shards and the task allocation and scheduling information of cluster nodes. This brings some additional complexity to the deployment and operation and maintenance of HStreamDB. To separately manage the Zookeeper cluster, etc.

To this end, we plan to remove HStreamDB's direct dependency on Zookeeper and introduce a dedicated HStream MetaStore component (HMeta for short). HMeta will provide a set of abstract metadata storage interfaces, which can theoretically be implemented based on a variety of storage systems. Currently we are developing a default implementation based on rqlite [ https://github.com/rqlite/rqlite ]. rqlite is based on SQLite and raft, written in Golang, very lightweight, easy to deploy and manage.

The development of HMeta is still in progress. As mentioned in our previous newsletter, the new cluster mechanism of HServer no longer relies on Zookeeper. This month, we have also migrated the EpochStore of HStore to HMeta. This feature will not be included in the upcoming v0.9 release, it needs more testing and we plan to release it in v0.10.

Client update

This month, the client also brought a number of upgrades in adapting to HStreamDB v0.9. Taking hstreamdb-java as an example, it mainly includes the following changes:

createStream The number of initial partitions can be specified
Added listShards method
producer and bufferedProducer to adapt to the new partition model
Added Reader class, which can be used to read any partition

Clients for other languages (Golang, Python) will also include support for v0.9.

other

Some other noteworthy features completed this month include:

Added advertised-listeners configuration to HServer, which is used to solve the problem of external clients accessing HStreamDB when HStreamDB is deployed in a complex network environment.
Improved the bootstrap process when HServer cluster starts.

Copyright statement: This article is original by EMQ, please indicate the source when reprinting.
Original link: https://hstream.io/zh/blog/hstreamdb-newsletter-202207

HStreamDB Newsletter 2022-07｜Partition model optimization, data integration framework further improved

Stream partition model improvements

HStream IO update

HStream MetaStore

Client update

other

EMQX

引用和评论

在 Windows 平台搭建 MQTT 服务

Full GC 频率优化实战

AR 智能生态鱼缸组态远控平台 | 图扑软件

Java 9 特性详解：从模块系统到 API 增强的全面剖析

《ESP32-S3使用指南—IDF版 V1.6》第十一章 KEY实验

正点原子ESP32P4开发板震撼来袭，助力嵌入式AI应用开发！

AIoT 智变浪潮演讲实录 | 刘浩然：让硬件会思考：边缘大模型网关助力硬件智能革新