Flink on Zeppelin series: Yarn Application mode support

Introduction to How Zeppelin implements and uses Yarn Application mode.

Author: Zhang Jianfeng (Jian Feng)

Last year, when Flink Forward talked about the future of the Flink on Zeppelin project, we talked about the support for the Application mode. Today there is good news to tell you that the community has realized this feature. Welcome everyone to join the Flink on Zeppelin nail group (32803524), download the latest version to use this feature.

GitHub address

https://github.com/apache/flink

Everyone is welcome to give Flink likes and send stars~

Application mode is a new operating mode introduced after Flink 1.11. The problem to be solved is to reduce the pressure on the client, and run the user's main function in the JobManager instead of the user client. This mode is very suitable for Flink on Zeppelin, because the client of Flink on Zeppelin is the Flink interpreter process, and Flink interpreter is a long running main function, which continuously accepts commands from the front end and performs corresponding operations (such as submitting a Job, Stop Job, etc.). Next, we will talk in detail about how Zeppelin implements the Yarn Application mode and how to use this mode.

1. Architecture

When talking about the Yarn Application mode architecture, let's talk about the evolution of Flink on Zeppelin's architecture by the way.

Normal Flink on Yarn operating mode

In this mode of the client, the Flink Interpreter process runs on the Zeppelin server machine, and each client corresponds to a Flink Cluster on Yarn. If there are many Flink Interpreter processes, it will put a lot of pressure on the Zeppelin machine.

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/wt1g3h
Reference video:
https://www.bilibili.com/video/BV1Te411W73b?p=6

Yarn Interpreter mode

Yarn Interpreter moved the client (Flink Interpreter) to the Yarn cluster, transferred the resource pressure to the Yarn cluster, and solved some of the problems of the normal Flink on Yarn operating mode above. This mode will require an additional Yarn Container for each Flink Cluster To run this Flink Interpreter, it is not very efficient in terms of resource utilization.

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/gcah8t
Reference video:
https://www.bilibili.com/video/BV1Te411W73b?p=24

Yarn Application mode

The Yarn Application mode completely solves the problems of the previous two modes. The Flink interpreter runs in the JobManager. This will not affect the resource pressure of the Zeppelin Server machine, nor will it cause any waste of Yarn cluster resources.

2. How to use Yarn Application mode

Configuring Yarn Application mode is very simple, just set flink.execution.mode to yarn-application. All other configurations are no different from other modes. All the features of Flink on Zeppelin below can be used as usual in Yarn Application mode. We also take this opportunity to review all the functions of Flink on Zeppelin.

Multi-language support

The following 3 languages are supported in the same Flink Cluster, and these 3 languages are open (shared Catalog, shared ExecutionEnvironment):

Scala (%flink)
PyFlink (%flink.pyflink)
SQL (%flink.ssql, %flink.bsql)

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/pg5s82
https://www.yuque.com/jeffzhangjianfeng/gldg8w/ggxz76
https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c
Reference video:
https://www.bilibili.com/video/BV1Te411W73b?p=4

Hive integration

Hive can be enabled by simple configuration.

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/agf94n
Reference video:
https://www.bilibili.com/video/BV1Te411W73b?p=10

UDF support

The following 4 ways to define and use Flink UDF are supported:

Write Scala UDF directly in Zeppelin;
Write PyFlink UDF directly in Zeppelin;
Create UDF with SQL;
Use flink.udf.jars to specify the jar containing udf.

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/dthfu2
Reference video:
https://www.bilibili.com/video/BV1Te411W73b?p=17
https://www.bilibili.com/video/BV1Te411W73b?p=18
https://www.bilibili.com/video/BV1Te411W73b?p=19

Third party reliance

There are two ways to specify third-party dependencies in Zeppelin, specifically:

flink.excuetion.packages
flink.execution.jars (It should be noted that in Yarn Application mode, you need to specify the HDFS path here, because Flink Interpreter runs in JobManager, and JobManager runs in yarn container, and you may not have you on the NodeManager machine of yarn container The jar to be specified)

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/rn6g1s
Reference video:
https://www.bilibili.com/video/BV1Te411W73b?p=15

Checkpoint & Savepoint

Checkpoint and Savepoint are used as usual.

Reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/mlnswx

SQL advanced features

Zeppelin has made a series of enhancements to Flink SQL, these enhancements can be used as usual, such as:

Support both Batch SQL and Streaming SQL
Multi-statement support
Comment support
Job parallelism support
Multiple insert support
JobName settings
Stream SQL streaming data visualization

Specific reference documents:
https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c

In addition, the Alibaba Cloud open platform team has been recruiting outstanding big data talents (including internship + social recruitment) for a long time. Our main responsibility is to provide basic services of big data and AI to major SME customers on Alibaba Cloud. Your job will be to build an easy-to-use, enterprise-level big data and AI open platform around Spark, Flink, Hadoop, Tensorflow, PyTorch and other open source components. Not only are there technical challenges, but also the passion for making products. We use a large number of open source technologies (Hadoop, Flink, Spark, Zeppelin, Kubernetes, Tensorflow, Pytorch, etc.) and are committed to giving back to the open source community.

If you are interested in open source, big data or AI, here is the best soil. Committer & PMC in many open source fields such as Apache Flink, Apache Kafka, Apache Zeppelin, Apache Beam, Apache Druid, and Apache Hbase. Interested students please send your resume to: jeffzhang.zjf@alibaba-inc.com

Copyright statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own the copyright, and does not bear the corresponding legal responsibility. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Flink on Zeppelin series: Yarn Application mode support

1. Architecture

Normal Flink on Yarn operating mode

Yarn Interpreter mode

Yarn Application mode

2. How to use Yarn Application mode

Multi-language support

Hive integration

UDF support

Third party reliance

Checkpoint & Savepoint

SQL advanced features

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

从 DeepSeek 看25年前端的一个小趋势

大模型中的Token究竟是什么？从原理到作用深度解析

Open WebUI：开源AI交互平台的全面解析

一文掌握 MCP 上下文协议：从理论到实践

MySQL × 向量数据库：大模型时代的黄金组合实战指南

Mac 安装 DeepSeek-R1 本地化部署