At the TGIP event held on March 13, Zhai Jia, a member of Apache Pulsar PMC and co-founder of StreamNative, reviewed the achievements and progress of Apache Pulsar in 2021, and deeply interpreted the technical development direction of Pulsar in 2022. and community dynamics. This article is the issue of TGIP "Foreseeing 2022! Apache Pulsar Technology Progress and Community Dynamics" live text-arranged version.

Scan the code to review the video👇👇👇

2021 in retrospect - Apache Pulsar has achieved great results

2021 is the third anniversary of Apache Pulsar becoming ASF's top-level project. Apache Pulsar is developing rapidly in 2021: the community welcomes its 400th contributor, the monthly active contributor surpasses Kafka , and the GitHub Star's Star exceeds 10,000+ .

image.png
Apache Pulsar GitHub main repository Star number growth chart

image.png
Apache Pulsar surpasses Kafka in monthly active contributors

These 10,000+ followers are scattered in more than 5,700 regions around the world, including more in Europe and North America, and also in Africa and South America.


Apache Pulsar has followers from all over the world

In the Apache community, Pulsar is also very active, ranking in the top five in the Apache Foundation's annual active Commit projects .

In 2021, three years after Pulsar became an ASF top program, Pulsar's "behind the scenes" StreamNative was also named Best Open Source Software Company by InfoWorld.

Focusing on the project and the community itself, Apache Pulsar in 2021 is also very rewarding. In keeping with "The Apache Way", Apache Pulsar voted on the Project Management Committee with 4 PMC members and 16 Committers , also distributed around the world. In terms of version iterations, Pulsar continues to release versions, with a total of 7 version releases in 2021. Among them, in version 2.8.0, the Pulsar transaction function is officially available , which can help users to achieve precise one-time semantics, and ensure the atomic operation of message production and confirmation in cross-topic scenarios.

In addition, in terms of upstream and downstream ecological construction, Pulsar is also continuously enriched and robust. For example, Pulsar Flink Connector - Source and Sink have been merged into Flink upstream; StreamNative leads or cooperates with partners to open source a number of peripheral projects, including Function Mesh, SQS Connector , AMQP 1.0 Connector, and RoP, etc.

At the same time, various activities of the Pulsar community are also actively carried out:

  • Pulsar 2.8.0 Release Party was held in many places (Beijing, Guangzhou, Shenzhen, etc.);
  • Hold regular monthly developer and user group meetings;
  • Online and offline meetups in Beijing, Shanghai, Hangzhou, Guangzhou and Shenzhen;
  • Three online Pulsar Summits were held, covering North America, Europe and Asia, with a total of 90+ topics discussed, including 1K+ registrations for the Asia Summit and 4W+ for live viewing. At the summit, Apache Pulsar PMC members gave a detailed interpretation of Pulsar's future development route, and showed the practice and exploration of how users in various industry scenarios can use Pulsar to solve their own pain points.

The relevant content of the above community activities can be subscribed and browsed on station B and Youtube.

In terms of books and tutorials , the first Chinese Apache Pulsar book will be published in 2021, written and officially published by Lin Lin, a member of Apache Pulsar PMC. In addition, StreamNative also cooperated with the dark horse programmer community of Chuanzhi Education to release Apache Pulsar Chinese video tutorials, which can be browsed and learned for free at station B.

Of course, the continuous development and growth of the Apache Pulsar community is inseparable from the continuous attention and active participation of companies in various industries. In Pulsar Summit Asia 2021, we also released two major awards: Pioneer Award and Excellent Case Award, among which E-Pay and Lakala were rated as pioneer awards, Kingsoft Cloud, Didi, Zhihu, WeChat, China Mobile Cloud Competence Center, Banyu, Ketuo Parking, Tencent Cloud Middleware, and FATE were rated as outstanding cases of the year.

Important Features of Apache Pulsar 2.10

In the upcoming Apache Pulsar 2.10 version, great progress has been made in function iteration and performance optimization, including plug-in metadata service support, automated cluster failover, global topic policy support, plug-in message filtering extension, and Redelivery backoff. , Chunk message ID, Table view and Lazy loading producer and other functional modules.

In terms of plug-in metadata services , Pulsar has always been closely integrated with ZooKeeper, but when ZooKeeper faces large-scale users, problems such as large concurrency and access pressure will occur.

In the process of community exploration, it is hoped that the metadata service can be more native, so as to better solve the problems encountered by users in the metadata layer. At present, users can switch to Etcd or other metadata services; and within Pulsar, all APIs have been basically completed and are in the process of continuous optimization and improvement.

In terms of automatic cluster failover , Pulsar has the functions of cross-region replication and multi-cluster interconnection and mutual backup, but there may be some problems in the process of interconnection and mutual backup. For a single cluster, the Cluster side of Pulsar provides multiple ways for users to access multiple Broker services of multiple clusters. In the case of multiple clusters, users usually use DNS, but they are not automated enough. Therefore, on the Cluster side, the community has optimized the features of Broker to make cluster switching more automatic.

From 2.9 to 2.10, StreamNative has put a lot of effort into optimizing system stability and performance in key scenarios. The iteration and performance improvement of these functions will also be summarized and released in the form of articles/reports on the community official account, and will also be introduced in detail in the released Blog. You can make an appointment for the TGIP live broadcast on March 27, where Pulsar PMC member Li Penghui will explain the key features of Pulsar 2.10.0 in detail.

Peripheral ecological planning: connectors & protocol plug-ins

As early as the eve of graduating from the Apache Software Foundation in 2018, Pulsar already had a solid foundation for cloud-native and data pipeline storage.

Many community users are actively exploring on this basis, such as integrating with other big data ecosystems and computing engines in terms of data pipelines. Therefore, the surrounding ecology has always been a part of the construction and development of Pulsar after graduating from the Apache Software Foundation. In this year's planning, Pulsar's ecological construction mainly includes two major components, namely peripheral connectors and protocol plug-ins.

In terms of connectors, StreamNative leads the integration of Pulsar and Snowflake, supports sinking data to Snowflake; integrates with Lakehouse technical architecture, and merges Source and Sink with the Flink community.

In the integration with Lakehouse, StreamNative first made it easier for users to quickly connect Pulsar data with Lakehouse data formats through Connector, allowing data to be exchanged between Pulsar, Hudi and Iceberg, allowing existing ecological users achieve direct connection.

The second is to directly and automatically convert the data in Pulsar into the format required by Hudi or Iceberg through the internal secondary storage of Pulsar, so that users can really use the batch stream fusion feature of Pulsar. By presenting users with a unified view of the data, it reduces the problems users face when integrating new data technologies.

In terms of protocol plug-ins, StreamNative facilitates users to connect with existing applications through various server-side protocol analysis embedded in Pulsar. KoP, MoP and AoP are the representatives of them.

Among them, KoP (Kafka on Pulsar) is the first exploration of StreamNative in protocol plugins .

There is a common abstraction in both Kafka and Pulsar, that is, the underlying topic is considered to be a log, and this abstraction is similar in many upper-level designs, which makes the implementation of Kafka on Pulsar simple and feasible. For Kafka, the bottom layer relies on the file system for data storage and log abstraction; for the bottom layer of Pulsar, BooKkeeper is used for log abstraction and implementation. KoP is currently the most used protocol plugin in the community, and will continue to improve and optimize in terms of stability, transaction support, and more complete operation and maintenance tools in the future.

MoP (MQTT on Pulsar) is Pulsar's support for MQTT . At present, many users in the community are closely following the pace of MoP and using it in their own online systems. In the future, MoP will add support for different versions of MQTT and SQS, and focus on developing stability and multi-protocol support.

AoP (AMQP on Pulsar), that is, Pulsar's support for AMQP . At present, some users have deployed AoP online to provide service support for AMQP. In the future, the community will provide more support for AMQP 1.0 according to the needs of users, and continue to optimize the scalability, readability and maintainability of AoP.

Outlook for Apache Pulsar community events in 2022

Focusing on this year, the various community activities that Pulsar plans to promote in 2022 are still exciting:

  • Release Party: The holding of the new version release celebration event can not only thank the contributors who participated in the version release, but also allow everyone to conduct in-depth interpretation and interactive discussions on the content of the new version;
  • Online and offline Meetup: After the epidemic is under control, Meetup will cover more cities and regions, allowing developers to communicate face-to-face with Pulsar contributors;
  • Monthly meeting of Chinese developers and user groups The meeting is held regularly on the last Wednesday of every month. On the basis of on-demand meeting organization, it covers more directions and conducts more refined operations to improve communication efficiency;
  • Pulsar Summit 2022: In 2022, the European Summit will be merged with the North American Summit into a Global Summit, which will be held in August, while the Asia Summit will be held in November, depending on the situation of the epidemic to determine online/offline.

In terms of books and tutorials, more excellent foreign books will be translated and introduced in 2022. Among them, " Apache Pulsar In Action " has been introduced by Turing Books and is expected to be published this year; " Mastering Apache Pulsar " has been introduced by Blog Viewpoint and is also expected to be published this year. In addition, more online tutorials about Pulsar will be released through relevant channels of community resources (WeChat official account, B station, mailing list, Slack, and GitHub).

Facing 2022, both Apache Pulsar and StreamNative hope that developers and community followers can put forward more opinions, make more voices, communicate and create together, let Pulsar solve the pain points of more industries, and foresee together A better 2022.

QA Featured

Q: In terms of big data, what are the main advantages of using Pulsar to replace Kafka? What to watch out for?

A: Compared with Kafka, Pulsar has a completely different architecture. Pulsar has been a cloud-native messaging platform since its inception, focusing on MQ and data pipelines. The unification of MQ and data pipeline application scenarios, more convenient management and scheduling in the cloud-native direction, and lighter operation and maintenance are the main advantages of Pulsar. At the same time, Pulsar has the advantages of consistency, batch-stream integration storage features, large cluster capabilities, cross-regional replication, interconnection and interconnection between public and private clouds, and docking with storage resources on the cloud. where.

Q: Does the community have plans to integrate Pulsar with data lakes, Hudi and Iceberg?

A: The plan of the first step is to transfer data between Pulsar, Hudi and Iceberg in the way of Connector; the second step is to automatically migrate Pulsar data to Hudi or Iceberg through Pulsar's secondary storage through the user's settings format and provide a unified data access layer through Pulsar.

Related Reading

About StreamNative

StreamNative is an open source basic software company formed by the founding team of Apache Pulsar, a top-level project of the Apache Software Foundation, to build the next-generation cloud-native batch-stream fusion data platform around Pulsar. As an Apache Pulsar commercial company, StreamNative focuses on open source ecology and community building, and is committed to innovation in the field of cutting-edge technologies. The founding team members have worked in well-known large companies such as Yahoo, Twitter, Splunk, and EMC.

Click to watch TGIP-CN live collection


ApachePulsar
192 声望939 粉丝

Apache软件基金会顶级项目,下一代云原生分布式消息系统