ApacheCon Asia 2021: Apache Pulsar technical issues at a glance

About Apache Pulsar
Apache Pulsar is the top-level project of the Apache Software Foundation. It is the next-generation cloud-native distributed message flow platform. It integrates messaging, storage, and lightweight functional computing. It uses a separate architecture design for computing and storage to support multi-tenancy, persistent storage, Multi-computer room and cross-regional data replication, with strong consistency, high throughput, low latency and high scalability and other streaming data storage characteristics.

GitHub address: http://github.com/apache/pulsar/

About ApacheCon Asia

ApacheCon Asia is the first ApacheCon online conference organized by the ApacheCon committee for the Asia-Pacific region. The main goal is to better serve the rapidly growing Apache users and contributors in the Asia-Pacific region. ApacheCon Asia 2021 will be held online August 6-8, 2021.

Recently, the ApacheCon Asia 2021 team officially announced the conference schedule. The Apache Pulsar community actively participates in this annual open source event. You can see the topics of the Apache Pulsar community members in special sessions such as messaging system, big data, and stream processing. The content is rich and welcome to follow. . Related technical issues are listed below for easy reference.

Big Data

2021-08-08 13:30

Use Hashicorp Vault to build an authentication and authorization system

Topic Introduction : Learn how to use Hashicorp Vault to build an authentication and authorization system for Apache Pulsar. Vault provides a secure way to generate tokens and store sensitive data, while Pulsar has a pluggable architecture for authentication, authorization, and key management. This lecture will introduce how to build an authentication and verification system for Pulsar based on Vault, mainly including the following points:

Build flexible authentication based on Vault to ensure that the Pulsar cluster can easily access various systems, such as LDAP
How to implement service accounts based on Vault-based application roles

shared guest : The main contributor and maintainer of Guangning, Apache Pulsar Committer, Apache Pulsar IO and Apache Pulsar Manager, currently working as a senior software engineer in StreamNative, specializing in cloud platforms, cloud computing and big data related fields .

Stream processing

2021-08-08 14:10

Structured data stream

Introduction to the topic : Type safety is extremely important in any application built around streams/queues. Type definition and evolution can be built in the application, and can also be supported by the data layer, so that the application only pays attention to the business logic, and does not need to pay attention to the way of data storage and evolution. It is this characteristic that allows traditional relational databases to stand firm in the challenges of modern NoSQL databases. In modern software architecture, asynchronous communication (via stream/queue) is essential. When data storage and query design change with asynchronous communication, type safety is still very important.

In this lecture, we will discuss the method of building a structure (schema) on streaming data, and take Apache Pulsar as an example. Apache Pulsar provides server-side and client-side support for structured stream processing. We have used Pulsar in production for asynchronous communication between microservices for more than 1.5 years.

This talk introduces what Schema is, how to represent Schema, what Apache Pulsar server and client provide, how we use Pulsar's Schema support to build our use cases, and the experience and technical details gained from it.

shared guest : Shivji Kumar Jha, Shiv is a senior software developer at Nutanix, working in the beam team to help Nutanix customers minimize the cloud costs and security risks of hybrid cloud use. Shiv likes to spend time on data storage (databases, data streams, analytics, etc.) and has contributed to MySQL and Pulsar code bases. Shiv is an avid reader (technology, fiction, economics, etc.) and is always looking for ways to simplify software architecture.

2021-08-08 15:30

Use Pulsar Functions to process real-time machine learning

Introduction to : In this lecture, I will introduce a technology that uses Apache Pulsar Functions to deploy machine learning models to provide real-time predictions. In order to provide real-time predictions, the model usually receives a data point from the caller and expects to provide an accurate prediction within a few milliseconds. Throughout the sharing, I will show the steps required to make a fully trained ML, which can predict the delivery service time based on real-time traffic information, the location of the customer, and the restaurant where the order will be completed.

shared guest : David Kjerrumgaard, the author of "Pulsar in Action", is also the chief software engineer of Splunk's messaging team, responsible for Splunk's internal Pulsar-as-a-Service platform. Before joining Splunk, he was the Director of Solution Architecture at Streamlio, responsible for developing best practices and solutions based on Apache Pulsar.

Message system

2021-08-06 13:30

Apache BookKeeper (as Key-value storage) and its application cases

Topic Introduction : In order to take full advantage of the best performance characteristics of streaming backend technology, it is important to understand the details of how streaming server servers store data. If you fully understand this, you can design a corresponding scenario solution, make full use of the resources at hand, and obtain the best consistency, availability, latency, and throughput for the resources at hand.

In this talk, we will discuss the storage layer of Apache Pulsar (Apache BookKeeper), the basic situation of BookKeeper storage semantics, how it is used in different scenarios (even outside of Pulsar), and understand Pulsar's storage object model. The type of data structure and the algorithm used by Pulsar, and how to map to the storage class semantics provided by Pulsar by default. Of course, you can also change the storage backend with some additional code. This lecture will provide you with relevant background knowledge so that you can use Pulsar to process data correctly. This lecture will focus on the storage backend so that in addition to Pulsar, the relevant principles and knowledge can also be applied to different data storage or streaming systems.

shared guest : Shivji Kumar Jha, Nutanix senior software developer, works in the beam team to help Nutanix customers minimize the cloud costs and security risks of hybrid cloud use. Shiv's work includes all of Nutanix's Pulsar, managing 4 Pulsar clusters (30 nodes) and the use cases surrounding it. Shiv likes to spend time on data storage (databases, data streams, analytics, etc.) and has contributed to MySQL and Pulsar code bases. Shiv is an avid reader (technology, fiction, economics, etc.) and is always looking for ways to simplify software architecture.

2021-08-06 14:50

BIGO's Apache Pulsar best practices

Topic Introduction : With the support of artificial intelligence technology, the video products and services launched by BIGO have gained huge popularity, with users in more than 150 countries around the world, such as Bigo Live (live) and Likee (short video). Bigo Live provides services in more than 150 countries and regions. Likee has more than 100 million users and is very popular among Gen Z. In the past few years, we have deployed a large number of Kafka clusters to support real-time ETL and short video recommendations. Apache Pulsar's layered architecture and many new features such as low latency, horizontal expansion, multi-tenancy, etc. have helped us solve many problems in production. We have adopted Apache Pulsar to build a message processing system, especially in real-time ETL, short video recommendation and real-time data reporting.

In this talk, I will share our experience of using KoP (Kafka-on-Pulsar) and discuss how to seamlessly migrate from Kafka to Pulsar, especially in terms of improving performance and stability. I will also share other major application scenarios of Apache Pulsar in BIGO, such as millions of topics, real-time machine learning, and integration with Flink and Flink SQL.

shared guest : Chen Hang, Apache Pulsar Committer, head of the BIGO messaging platform team, responsible for creating a centralized pub-sub messaging platform that provides a large amount of service/application traffic. He introduced Apache Pulsar into the BIGO messaging platform and integrated it with upstream and downstream systems such as Flink, ClickHouse and other internal systems for real-time recommendations and analysis. He focuses on Pulsar's performance adjustment, new function development and integration of the Pulsar ecosystem.

2021-08-06 15:30

From Apache Kafka to Apache Pulsar-System Migration Survival Guide

Introduction to the topic : In this speech, after a brief, high-level architecture comparison between Kafka and Pulsar, we focus on comparing the news release/use model between Kafka and Pulsar, and the difference between them The similarities and differences, and the corresponding impact on application design and implementation. Finally, we will introduce the different migration options, modes and tools available to achieve a seamless application migration path from Kafka to Pulsar.

shared guest : Meng Yabin, DataStax lead architect. In recent years, his focus has been mainly on the design and consulting of large-scale, distributed database and stream processing system solutions. Before joining DataStax, he spent most of his career in system design, implementation and consulting in the fields of relational databases, data warehouses, business intelligence, and NoSQL databases.

2021-08-06 16:10

A detailed explanation of the case study of Apache Pulsar in the federation

Topic Introduction : Federated Learning (FL) is a machine learning technology that allows multiple dispersed organizations to train a model without exposing local data samples. In the process of federated learning and training, participants will also exchange a large amount of encrypted information to summarize and form a global model. Due to the importance of messages, as well as the requirements for real-time and sequentiality, it brings some transmission challenges. In this lecture, we will discuss how to use the Apache Pulsar project to solve the above challenges, and introduce in detail how to use Pulsar for joint training https://github.com/FederatedAI/FATE)

shared guest : Chen Jiahao, VMware engineer

2021-08-08 13:30

Apache Pulsar best practices in log scenarios

Introduction to the topic : ELK+Apache Kafka is a common logging scenario architecture. However, today the situation has changed, cloud native has become popular, and microservice architectures have been adopted everywhere. This has brought more services, and the number and types of logs have increased. Apache Kafka cannot meet all the requirements of cloud-native log scenarios, such as simple operation, million-topic management, and leasing resource isolation. Apache Pulsar is a better solution with cloud native architecture and better performance. This talk focuses on Apache Pulsar as a new log message solution, including the requirements for the log message system, the comparison of Kafka and Pulsar solutions, Pulsar best practices, and the introduction of Pulsar Functions/connectors.

shared guest : Wei Bin, StreamNative solution engineer, he has rich experience in big data technology such as ELK, Apache Kafka, Apache Pulsar, Prometheus.

2021-08-08 14:10

Apache Pulsar —— The practice of cloud native message queue in Tencent Cloud

topics Introduction :
Apache Pulsar is currently used on a large scale on Tencent Cloud. Message queues face many challenges in the cloud native environment. Pulsar is a better solution. In this speech, we will introduce some practical experience of Pulsar in the cloud native environment, such as: how to quickly and dynamically expand and shrink the capacity, how to improve the utilization of cluster resources, the cluster form and so on.

Sharing guest : Lin Lin, senior engineer of Tencent Cloud, Apache Pulsar Commiter, focuses on the field of middleware, and has rich experience in message queues and microservices. Joined Tencent in 2019 and is now responsible for the construction of Tencent Cloud TDMQ, committed to creating stable, efficient and scalable underlying basic components and services.

2021-08-08 14:50

Application and practice of Apache Pulsar under Tencent's million-level Topic

Topic Introduction : Apache Pulsar, as the next-generation cloud-native distributed message flow platform, integrates messaging, storage and functional computing, and adopts an architecture that separates storage and computing. Apache Pulsar has successfully supported a large number of data and traffic business scenarios within Tencent Cloud. This topic will share Tencent Cloud's best practices and operation and maintenance experience under the Apache Pulsar million-level Topic.

shared guest : Ran Xiaolong, joined Tencent in 2020, is now responsible for the construction of Tencent Cloud TDMQ, and is committed to creating stable, efficient and scalable underlying basic components and services.

2021-08-08 15:30

Apache Pulsar's RBAC authorization

Introduction to the topic : RBAC (Role-based access control) is a method of controlling system access based on the role of a single user. RBAC uses the mapping relationship between users and roles and the permissions corresponding to each role to determine whether each user can perform operations on certain resources. Apache Pulsar uses Casbin to implement the RBAC authorization method. By enabling the RBAC authorization method, you can manage which role a user belongs to and what permissions the role has on a certain resource. This talk mainly introduces RBAC authorization in Apache Pulsar. I will explain the basic RBAC concepts and the principles of Casbin, and introduce how to use Casbin Provider to enable RBAC authorization for Pulsar, how to use RBAC to set and manage permissions in Pulsar, and how to use Zookeeper Adapter for RBAC in Pulsar.

shared guest : Yang Zike, works as a software engineer at StreamNative. He has been involved in the work of the Pulsar community since 2020.

2021-08-08 16:10

Apache Pulsar on the journey of Huawei Cloud IoT platform

Topic Introduction : Huawei Cloud IoT platform is currently the first competitive IoT platform in China, currently managing more than 300 million devices. This presentation will introduce:
Why did Huawei Cloud IoT change the message queue from Kafka to Pulsar?
How does Huawei Cloud IoT use Pulsar, as well as the related problems and corresponding solutions encountered during use.

shared guest : He Zhangjian, graduated from Xidian University in 2017, and has worked in the IoT department of Huawei since 2017.

Sign up for ApacheCon Asia 2021

At present, the ApacheCon Asia 2021 event is open for registration, you can click "Link: http://hdxu.cn/Q7LkI " to register!

ApacheCon Asia 2021: Apache Pulsar technical issues at a glance

About Apache Pulsar

About ApacheCon Asia

Big Data

Stream processing

Message system

Sign up for ApacheCon Asia 2021

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

Apache Pulsar 技术系列 - 大规模延迟消息解析

Apache Answer 邀你共赴 CommunityOverCode Asia 2024

为什么使用 RocketMQ？