The author of this article is Axel Sirota, and the translators are Liu Yu, Jennifer, and Sijia of StreamNative.

Original link https://streamnative.io/en/blog/tech/2021-02-10-migrate-to-serverless-with-pulsar-functions .

About Apache Pulsar

Apache Pulsar is a top-level project of the Apache Software Foundation. It is a next-generation cloud-native distributed message flow platform that integrates messaging, storage, and lightweight functional computing. Multi-machine room and cross-region data replication, with streaming data storage features such as strong consistency, high throughput, low latency, and high scalability.
GitHub address: http://github.com/apache/pulsar/

Pulsar 2.0 version introduces Pulsar Functions. Pulsar Functions enables users to easily and smoothly migrate to serverless applications. This article mainly introduces the basic information of Pulsar Functions and how to develop Pulsar Functions. In addition, this article lists some considerations when migrating applications to Pulsar Functions.

simple scene

Suppose there is such a use case: We run an e-commerce company whose main business is to process payment invoices. In Pulsar, this business consists of the following three steps:

  • Import invoices to the order topic;
  • Execute the code to separate invoice values with commas;
  • Write the invoice value to PostgreSQL.

This article mainly introduces the second step. In general, the code we execute may be a serverless function created in AWS Lambda, or it may be a full-fledged microservice. This approach has many disadvantages.

First, we develop a full service for a short piece of code. Due to the complexity of the development work, full implementation may take two weeks.

Second, since the source data schema is constantly changing, maintenance will become increasingly difficult. We need to do full version control and redeployment of the service and underlying PostgreSQL tables, which can take at least a day.

Additionally, the AWS Lambda function requires authentication when connecting or disconnecting from Pulsar. Pulsar first invokes the Lambda function, and then the Lambda function itself authenticates to Pulsar. There is a performance hit because the Lambda function introduces unnecessary two-way message passing.

Introducing Pulsar Functions

Pulsar Functions is a lightweight computing framework for processing data between topics. Pulsar Functions run in Pulsar, so there is no need to deploy microservices separately, saving time and simplifying troubleshooting.

The complexity of Pulsar Functions is flexible. Pulsar Functions not only supports converting/moving data from one topic to another, but also supports sending data to multiple topics, doing complex routing and batching requests, etc.

Pulsar Functions are easy to debug and support deploying functions in debug mode, i.e. debugging while connected to code and executing in real-time.

Develop Pulsar Functions

Creating a Pulsar Function in a familiar programming language is as easy as implementing a Pulsar Functions subclass. The following code is written in Java language, Pulsar also supports Python and Go languages.

public class SplitFunction implements Function<String, List<String>> {
   @Override
   public List<String> apply(String input) {
       return Arrays.asList(input.split(","));
   }
}

After compiling and packaging the code, deploy the function to the Pulsar instance functions create The arguments to this command are the packaged code and the function's input/output topic.

bin/pulsar-admin functions create --jar target/split.jar --classname demo.SplitFunction --input input-topic --output output-topic

It can take up to two days to develop and deploy a custom Pulsar Function. After deployment, Pulsar Functions can greatly simplify the workload for users and shorten the product release time. Pulsar allows users to deploy any number of Pulsar Functions to fetch data from topics and send to other topics, as well as easily write status messages to Pulsar logs. Pulsar Functions not only simplifies the deployment process of Pulsar, enhances the flexibility of Pulsar, but also expands the functions of Pulsar.

Develop complete Pulsar Functions

How to take full advantage of the rich features of Pulsar Functions?

Developing a complete Pulsar Function is as easy as implementing the Function interface in a class. First implement the process() method. process() method is the gateway connected to Pulsar. With semantic objects, we can access loggers, trace output, send messages to topics, and more.

Example code to get input data and extract invoice price using Pulsar Functions is as follows. We can use the semantic object to send this data to another output topic (if we want to send data to the specified output topic when the Function is deployed, just return it as the return value of the Function. This example shows how by sending data to another topic Use Pulsar Functions for routing and return null from Function.)

import org.apache.pulsar.functions.api.Function;
public class RoutingFunction implements Function<String, Void> {
   @Override
   public Void process(String input, Context context) throws Exception {
       Logger LOG = context.getLogger();
       LOG.info(String.format("Got this input: %s", input));
       Price inputPrice  = new Price(input);
       String topic = String.format("year-%s", inputPrice.getYear());
       context.newOutputMessage(topic, Schema.STRING).value(inputPrice.getPrice()).send();
       // We could also return some object here and it would be sent to the
       // output topic set during function submission
       return null;   
   }
}

Lower cost than AWS Lambda

Why did we decide to switch to Pulsar Functions now that AWS Lambda can do what we need? Compared to AWS Lambda, Pulsar Functions has a number of advantages, including ease of debugging, removal of two-way authentication between Pulsar and Lambda, and more.

Let's compare the cost of using AWS Lambda and Pulsar Functions through a common usage scenario. Assume that in a real-time bidding system of an online auction, there are 10,000 bids per second (26 billion requests per month), only considering the request cost, not the computing time, and the cost is $5,000. Assuming 100 ms per request and a 2048 GB virtual machine, the compute cost is $86,000. And that's not including the cost of AWS transferring data!

AWS Lambda is an excellent choice for serverless functions, but only for small-scale use cases. Data pipelines that process billions of transactions with Lambda are expensive.

Significant cost savings can be achieved with Pulsar Functions. When I first joined JAMPP, the JAMPP team used only Lambda, and a small part of the data pipeline cost over $30,000 per month. When we migrated from AWS Lambda to Pulsar Functions, the cost dropped to a few hundred dollars per month, and this part of the expenditure was mainly spent on hosting Pulsar on Amazon EC2 instances.

Migrating to Pulsar Functions

First look at the architecture using Pulsar Functions. In the example usage scenario, we wrote a Java function in AWS Lambda to process data between topics. Pulsar Functions replaces Lambda in this architecture, simplifying development and deployment.

图片

After deploying Pulsar Functions, we need to create import and dump scripts for the data. With Pulsar IO , users can easily define external data sources/sinks within Pulsar, simplifying the process. Pulsar IO source/sink itself is also implemented as Pulsar Functions, that is, users can create custom source/sink in Pulsar to simplify debugging operations.

Migrating to Pulsar Functions requires three steps:

  • Migrate all processing logic to one or more Pulsar Functions
  • Convert IO logic (using Pulsar IO source/sink)
  • Processing log data using log topics

Complete migration to a serverless application running in Pulsar takes only three steps.

What if the user's current messaging system is Kafka? Don't worry, no coding, use Kafka-on-Pulsar smooth migration from Kafka to Pulsar.

Epilogue

This article briefly describes how to use Pulsar Functions. In addition to the features discussed in this article, Pulsar Functions is adding more exciting features. For example, StreamNative recently announced the release of Pulsar Function Mesh , which supports the coordinated deployment of Pulsar Function service clusters.

This article focuses on how to develop Pulsar Functions, how to migrate applications to serverless applications running on Pulsar, and the ease of use and flexibility of Pulsar Functions.

Good luck with the migration!

Related Reading

Click " read the original text " to get Apache Pulsar hard-core dry goods information!


ApachePulsar
192 声望939 粉丝

Apache软件基金会顶级项目,下一代云原生分布式消息系统