头图

Author: Zhang Jianfeng (Jian Feng)

Last year, when Flink Forward talked about the future of the Flink on Zeppelin project, we talked about the support for the Application mode. Today, there is good news to tell you that the community has implemented this feature, and everyone is welcome to download the latest version to use this feature.

Application mode is a new operating mode introduced after Flink 1.11. The problem to be solved is to reduce the pressure on the client, and run the user's main function in the JobManager instead of the user client. This mode is very suitable for Flink on Zeppelin, because the client of Flink on Zeppelin is the Flink interpreter process, and Flink interpreter is a long running main function, which continuously accepts commands from the front end and performs corresponding operations (such as submitting a Job, Stop Job, etc.). Next, we will talk in detail about how Zeppelin implements the Yarn Application mode and how to use this mode.

Architecture

When talking about the Yarn Application mode architecture, let's talk about the evolution of Flink on Zeppelin's architecture by the way.

Normal Flink on Yarn operating mode

In this mode of clients, Flink Interpreter runs on Zeppelin, and each client corresponds to a Flink Cluster on Yarn. If there are many Flink Interpreter processes, it will put a lot of pressure on Zeppelin.

Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/wt1g3h

Reference video: https://www.bilibili.com/video/BV1Te411W73b?p=6

image.png

Yarn Interpreter mode

Yarn Interpreter moved the client (Flink Interpreter) to the Yarn cluster, transferred the resource pressure to the Yarn cluster, and solved some of the problems of the normal Flink on Yarn operating mode above. This mode requires an additional Yarn Container for each Flink Cluster To run this Flink Interpreter, it is not very efficient in terms of resource utilization.

Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/gcah8t

Reference video: https://www.bilibili.com/video/BV1Te411W73b?p=24

image.png

Yarn Application mode

The Yarn Application mode completely solves the problems of the previous two modes, and runs the Flink interpreter in the JobManager, so that it will not affect the resource pressure of the Zeppelin Server machine, nor will it cause any waste of Yarn cluster resources.

image.png

How to use Yarn Application mode

Configuring Yarn Application mode is very simple, just set flink.execution.mode to yarn_application. All other configurations are no different from other modes. All the following features of Flink on Zeppelin can be used as usual in Yarn Application mode. We also take this opportunity to review all the functions of Flink on Zeppelin.

Multi-language support

The following 3 languages are supported in the same Flink Cluster, and these 3 languages are opened (shared Catalog, shared ExecutionEnvironment)

  • Scala (%flink)
  • PyFlink (%flink.pyflink)
  • SQL (%flink.ssql, %flink.bsql)
Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/pg5s82

https://www.yuque.com/jeffzhangjianfeng/gldg8w/ggxz76

https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c

Reference video: https://www.bilibili.com/video/BV1Te411W73b?p=4

Hive integration

Hive can be enabled by simple configuration:

Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/agf94n

Reference video: https://www.bilibili.com/video/BV1Te411W73b?p=10

UDF support

Support the following 4 ways to define and use Flink UDF

  • Write Scala UDF directly in Zeppelin
  • Write PyFlink UDF directly in Zeppelin
  • Create UDF with SQL
  • Use flink.udf.jars to specify the jar containing udf
Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/dthfu2

Reference video: https://www.bilibili.com/video/BV1Te411W73b?p=17

https://www.bilibili.com/video/BV1Te411W73b?p=18

https://www.bilibili.com/video/BV1Te411W73b?p=19

Third party reliance

In Zeppelin, you can use the following 2 ways to specify third-party dependencies, specifically

  • flink.excuetion.packages
  • flink.execution.jars (It should be noted that in Yarn Application mode, you need to specify the HDFS path here, because Flink Interpreter runs in JobManager, and JobManager runs in yarn container, and you may not have you on the NodeManager machine of yarn container The jar to be specified)
Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/rn6g1s

Reference video: https://www.bilibili.com/video/BV1Te411W73b?p=15

Checkpoint & Savepoint

Checkpoint and Savepoint are used as usual,

Reference document: https://www.yuque.com/jeffzhangjianfeng/gldg8w/mlnswx

SQL advanced features

Zeppelin has made a series of enhancements to Flink SQL, these enhancements can be used as usual, such as:

Specific reference documents: https://www.yuque.com/jeffzhangjianfeng/gldg8w/te2l1c

ApacheFlink
946 声望1.1k 粉丝