HStreamDB Newsletter 2022-05|Decentralized cluster mechanism, new data integration framework, new clients and deployment methods

This month, the HStreamDB team officially released v0.8 and started development work on v0.9, which will bring major improvements in clustering, external system integration, partitioning, and more. This month, we mainly completed the design and preliminary development of HStream IO, a new cluster mechanism and data integration framework, and started the development of a new Python client. At the same time, the Erlang client version 0.1 was officially released, and the deployment support of Helm and Alibaba Cloud was added.

HServer cluster mechanism improvement

In v0.8 and earlier versions, the HServer cluster mainly adopts the centralized clustering mechanism based on ZooKeeper. ZooKeeper is used to register and discover HServer nodes and coordinate between nodes. There is no direct communication between HServer nodes. This clustering scheme is adopted by a large number of distributed systems and is relatively mature. The main disadvantage is that it needs to rely on external systems such as ZooKeeper, which is not flexible enough, and has some limitations in terms of scalability.

In order to support larger clusters and better scalability, as well as reduce dependence on external systems, v0.9 will adopt a decentralized clustering mechanism. The new clustering scheme will be mainly based on the SWIM[1] paper, and its core includes a A set of efficient failure dectation algorithm and gossip style cluster message propagation mechanism, similar solutions have been applied in distributed systems such as Consul and Cassandra. At present, the new cluster related functions are still in the research and development process and will be officially released in v0.9.

New data integration framework HStream IO

In order to meet a variety of different business needs, there are often multiple sets of data systems or data platforms within an enterprise, including but not limited to: online transaction library, offline analysis library, cache system, search system, batch processing system, real-time processing system, data Lake and more. While focusing on streamlining and reshaping the real-time data stack, HSteamDB, as an emerging streaming database, also shoulders the mission of promoting the efficient flow of data throughout the data stack and promoting the modernization and real-timeization of enterprise data stacks. The ability to integrate with numerous external systems is also very important to HStreamDB.

HStream IO is the data integration framework within HStreamDB. It includes components such as source connectors, sink connectors, and IO Runtime. It can import data from external systems into HStreamDB through source connectors, and export data in HStreamDB to external systems through sink connectors. . It is also worth noting that HStream IO will be implemented based on Airbyte spec, which means that we will be able to fully reuse a large number of open source connectors in the Airbyte community, and quickly integrate HStreamDB with any system. This month HStream IO has completed the design and preliminary development work, and will be officially released in v0.9.

Client update

Add Python client

This month we also started the research and development of HStreamDB's Python client hstreamdb-py, which supports Python 3.7 and above, and will be officially released next month.

hstreamdb-erlang v0.1 released

This month, HStreamDB's Erlang client hstreamdb-erlang officially released v0.1. For details, please refer to https://github.com/hstreamdb/hstreamdb-erlang/blob/main/README.md

Deployment method update

Added Helm-based deployment support

Helm ( https://helm.sh/ ) can help users install and manage K8s applications more easily. This month, HStreamDB also provides Helm-based deployment support. For details, please refer to the document https://hstream.io/docs/en /latest/deployment/deploy-helm.html#building-your-kubernetes-cluster

Added Alibaba Cloud Terraform deployment support

Previously, we provided a tutorial on deploying HStreamDB on AWS and HUAWEI CLOUD based on Terraform. This month, we added support for deployment on Alibaba Cloud. For details, please refer to the document https://hstream.io/docs/zh/latest/deployment /deploy-terraform-aliyun.html

[1]: Das, A., Gupta, I. and Motivala, A., 2002, June. Swim: Scalable weakly-consistent infection-style process group membership protocol. In Proceedings International Conference on Dependable Systems and Networks (pp. 303 -312).IEEE.

Copyright statement: This article is original by EMQ, please indicate the source when reprinting.

Original link: https://hstream.io/zh/blog/hstreamdb-newsletter-202205

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服...

305 声望
429 粉丝
0 条评论
EMQX 在 Kubernetes 中如何进行优雅升级
为了降低 EMQX 在 Kubernetes 上的部署、运维成本,我们将一些日常运维能力进行总结、抽象并整合到代码中,以 EMQX Kubernetes Operator 的方式帮助用户实现 EMQX 的自动化部署和运维。

EMQX阅读 358


石志远1阅读 1.5k


葡萄城技术团队阅读 2.2k

ElasticSearch 必知必会 - 进阶篇
京东物流:康睿 姚再毅 李振 刘斌 王北永说明:以下全部均基于 ElasticSearch 8.1 版本一.跨集群检索 - ccr官网文档地址: [链接]跨集群检索的背景和意义跨集群检索定义跨集群检索环境搭建官网文档地址: [链接]...

京东云开发者2阅读 321

MergeTree 系列的引擎被设计用于插入极大量的数据到一张表当中。数据可以以数据片段的形式一个接着一个的快速写入,数据片段在后台按照一定的规则进行合并。相比在插入时不断修改(重写)已存储的数据,这种策略...

京东云开发者1阅读 631


华为云开发者联盟1阅读 1.3k


葡萄城技术团队阅读 1.2k

EMQ(杭州映云科技有限公司)是一家开源物联网数据基础设施软件供应商,交付全球领先的开源 MQTT 消息服...

305 声望
429 粉丝