In-depth interpretation of Flink SQL 1.13

This article was compiled by community volunteer Chen Zhengyu. The Apache Flink community released version 1.13 in May, which brought many new changes. The article is compiled from "In-Depth Interpretation of Flink SQL 1.13" shared by Xu Bangjiang (Xue Jin) at the Flink Meetup in Beijing on May 22. The content includes:
Flink SQL 1.13 overview
Interpretation of core features
Interpretation of important improvements
Flink SQL 1.14 future planning
to sum up

1. Overview of Flink SQL 1.13

Flink 1.13 is a large community version, with more than 1,000 issues resolved. From the above figure, we can see that most of the issues resolved are related to the Table/SQL module. A total of more than 400 issues accounted for about 37% of the total. These issues mainly revolve around 5 FLIPs. In this article, we will also introduce these 5 aspects. They are:

Let's interpret these FLIPs in detail below.

2. Interpretation of core features

1. FLIP-145: Support Window TVF

Friends in the community should understand that the basic version of this function has been developed in internal branches of companies such as Tencent, Alibaba, and Bytedance. This time the Flink community also launched TVF related support and optimization in Flink 1.13. The following will analyze this new function from Window TVF syntax, near real-time cumulative calculation scenarios, Window performance optimization, and multi-dimensional data analysis.

1.1 Window TVF Syntax

Before version 1.13, the implementation of window was through a special SqlGroupedWindowFunction:

SELECT 
    TUMBLE_START(bidtime,INTERVAL '10' MINUTE),
  TUMBLE_END(bidtime,INTERVAL '10' MINUTE),
  TUMBLE_ROWTIME(bidtime,INTERVAL '10' MINUTE),
  SUM(price)
FROM MyTable
GROUP BY TUMBLE(bidtime,INTERVAL '10' MINUTE)

In version 1.13, we standardized the table-valued function syntax:

SELECT WINDOW_start,WINDOW_end,WINDOW_time,SUM(price) 
FROM Table(TUMBLE(Table myTable,DESCRIPTOR(biztime),INTERVAL '10' MINUTE))
GROUP BY WINDOW_start,WINDOW_end

By comparing the two grammars, we can find that the TVF grammar is more flexible and does not need to be followed by the GROUP BY keyword. At the same time, Window TVF is based on relational algebra, making it more standard. When you only need to divide the window scene, you can only use TVF without using GROUP BY for aggregation, which makes TVF more scalable and expressive, and supports custom TVF (such as TVF that implements TOP-N).

The example in the above figure is the division of rolling windows using TVF. You only need to divide the data into windows without aggregation; if aggregation is needed later, then GROP BY can be performed. At the same time, for users who are familiar with batch SQL, this operation is very natural. We no longer need to use the special SqlGroupedWindowFunction to bind the window division and aggregation together as before version 1.13.

Currently Window TVF supports tumble window, hop window, and new cumulate window; session window is expected to be supported in version 1.14.

1.2 Cumulate Window

Cumulate window is the accumulation window. In simple terms, a section on the time axis in the above figure is the window step size.

The first window counts data in an interval;
The second window counts the data of the first interval and the second interval;
The third window counts the data of the first interval, the second interval and the third interval.

Cumulative calculations are very common in business scenarios, such as cumulative UV scenarios. In the UV market curve: we count the accumulated user UVs of the day every 10 minutes.

Before version 1.13, when such calculations were needed, our general SQL was written as follows:

INSERT INTO cumulative_UV
SELECT date_str,MAX(time_str),COUNT(DISTINCT user_id) as UV
FROM (
    SELECT
      DATE_FORMAT(ts,'yyyy-MM-dd') as date_str,
      SUBSTR(DATE_FORMAT(ts,'HH:mm'),1,4) || '0' as time_str,
      user_id
  FROM user_behavior
)
GROUP BY date_str

First splice the time window field to which each record belongs, and then aggregate all records according to the spliced time window field through GROUP BY, so as to achieve the effect of approximate cumulative calculation.

The writing method before version 1.13 has many shortcomings. First of all, this aggregation operation is calculated once for each record. Secondly, when chasing data and consuming accumulated data, the curve of the UV market will jump.
In version 1.13, TVF writing method is supported. Based on cumulate window, we can modify it to the following writing method. Each piece of data is accurately divided into each Window according to Event Time. The calculation of each window is triggered by watermark, even in data chasing scenes. It will not jump in the middle.

INSERT INTO cumulative_UV
SELECT WINDOW_end,COUNT(DISTINCT user_id) as UV
FROM Table(
    CUMULATE(Table user_behavior,DESCRIPTOR(ts),INTERVAL '10' MINUTES,INTERVAL '1' DAY))
)
GROUP BY WINDOW_start,WINDOW_end

The effect of UV market curve is shown in the figure below:

1.3 Window performance optimization

Flink 1.13 community developers have carried out a series of performance optimizations for Window TVF, including:

memory optimization: pre-allocates memory, caches window data, triggers calculations through window watermark, and applies for some memory buffers to avoid high-frequency access to state;
slice optimization: slices the window and reuses the calculated results as much as possible, such as hop window and cumulate window. The calculated slice data does not need to be calculated again, only the calculation result of the slice is reused;
operator optimization: window operator supports local-global optimization; at the same time, it supports count(distinct) automatic solution hotspot optimization;
late data: supports calculating late data to subsequent shards to ensure data accuracy.

Based on these optimizations, we conduct performance tests through the open source Benchmark (Nexmark). The results show that the general performance of the window has been improved by 2x, and there will be a better performance improvement in the count(distinct) scenario.

1.4 Multidimensional data analysis

The standardization of grammar brings more flexibility and scalability, and users can directly perform multi-dimensional analysis on window functions. As shown in the figure below, you can directly perform the analysis and calculation of GROUPING SETS, ROLLUP, and CUBE. If it is a version before 1.13, we may need to perform a separate SQL aggregation on these groups, and then perform a union operation on the aggregation results to achieve a similar effect. Now, scenes like this kind of multi-dimensional analysis can be directly supported on window TVF.

supports Window Top-N

In addition to multi-dimensional analysis, Window TVF also supports Top-N syntax, making it easier to write Top-N on Window.

2. FLIP-162: Time zone and time function

2.1 Analysis of time zone issues

When you use Flink SQL, you have reported a lot of time zone related issues. The reasons for the time zone problem can be summarized into three:

The PROCTIME() function should consider the time zone, but not the time zone;
The CURRENT_TIMESTAMP/CURRENT_TIME/CURRENT_DATE/NOW() function does not consider the time zone;
The time attribute of Flink can only be defined on the TIMESTAMP data type. This type has no time zone. The TIMESTAMP type does not consider the time zone, but the user wants the time in the local time zone.

For the TIMESTAMP type that does not consider the time zone, we propose to support the TIMESTAMP_LTZ type (TIMESTAMP_LTZ is the abbreviation of timestamp with local time zone). You can use the following table to compare with TIMESTAMP:

TIMESTAMP_LTZ is different from the TIMESTAMP we used before, it means absolute time. Through comparison, we can find:

If we configure to use TIMESTAMP, it can be of string type. Whether the user observes from the UK or China time zone, this value is the same;
But for TIMSTAMP_TLZ, its source is a Long value, which represents the elapsed time from the origin of time. At the same moment, the time elapsed from the origin of time is the same in all time zones, so this Long value is the concept of absolute time. When we observe this value in different time zones, we will use the local time zone to interpret it into a readable format of "year-month-day-hour-minute-second". This is the TIMSTAMP_TLZ type, and the TIMESTAMP_LTZ type is more in line with the user’s Usage habits in different time zones.

The following example shows the difference between TIMESTAMP and TIMESTAMP_LTZ.

2.2 Time function correction

corrected PROCTIME() function

When we had the TIMESTAMP_LTZ type, we corrected the PROCTIME() type: before version 1.13, it always returned UTC's TIMESTAMP; now, we changed the return type to TIMESTAMP_LTZ. In addition to representing functions, PROCTIME can also represent time attributes.

revised CURRENT_TIMESTAMP/CURRENT_TIME/CURRENT_DATE/NOW() function

The values of these functions in different time zones will change. For example, in the UK UTC time zone, it is 2 AM; but if you set the time zone to UTC+8, the time is at 10 AM. The actual time in different time zones will change, and the effect is as follows:

Solve the processing time Window time zone problem

Everyone knows that proctime can represent a time attribute, and the window operation of proctime:

Before version 1.13, if we need to do day-by-day window operations, you need to manually solve the time zone problem, do some 8-hour offset and then reduce it back;
We solved this problem in FLIP-162. Now it is very simple for users to use it. They only need to declare the proctime attribute, because the return value of the PROCTIME() function is TIMESTAMP_LTZ, so the result is that the local time zone will be considered. The example in the figure below shows that in different time zones, the aggregation of the window of the proctime attribute is performed according to the local time zone.

function value method in Streaming and Batch mode

The time function is actually different in the form of flow and batch. This revision is mainly to make it more in line with the actual usage habits of users. For example, the following function:

In the streaming mode, it is per-record calculation, that is, each piece of data is calculated once;
In Batch mode, query-start calculation is performed, that is, it is calculated once before the job starts. For example, some of our commonly used batch calculation engines, such as Hive, are also calculated once before each batch starts.

2.3 Use of time types

In version 1.13, it is also supported to define Event time on the TIMESTAMP column, which means that Event time now supports definition on the TIMESTAMP column and also supports the definition on the TIMESTAMP_ LTZ column. So, as a user, what type should be used in specific scenarios?

When the upstream source data of the job contains the time of the string (such as: 2021-4-15 14:00:00), directly declare it as TIMESTAMP and define Event time on it. The window will be calculated when calculating Divide based on the time string, and finally calculate the expected result that meets your actual needs;
When the tick time of the upstream data source is a long value, it means an absolute time. In version 1.13 you can define Event time on TIMESTAMP_LTZ. At this time, the various WINDOW aggregations defined on the TIMESTAMP_LTZ type can automatically solve the 8-hour time zone offset problem, and there is no need to modify and correct the time zone according to the previous SQL writing.

Tips: Regarding the time function and time zone support in Flink SQL, these enhancements are version incompatible. When updating the version, users need to pay attention to whether such functions are included in the job logic to avoid business impact after the upgrade.

2.4 Daylight saving time support

Before Flink 1.13, for users in summer time zones abroad, it was very difficult to do window-related calculation operations because of the transition between summer time and winter time.

Flink 1.13 supports the definition of time attributes on the TIMESTAMP_LTZ column, and at the same time, Flink SQL cleverly combines the TIMESTAMP and TIMESTAMP_LTZ types during WINDOW processing, elegantly supporting daylight saving time. This is more useful for users of foreign daylight saving time zones and companies with overseas business scenarios.

3. Interpretation of important improvements

1. FLIP-152: Improve Hive syntax compatibility

FLIP-152 mainly improves the compatibility of Hive grammar, and supports some common DML and DQL grammars of Hive, including:

Support Hive common syntax through Hive dialect. Hive has many built-in functions. Hive dialect needs to be used with HiveCatalog and Hive Module. Hive Module provides all built-in functions of Hive, which can be accessed directly after loading.

At the same time, we can also create/delete catalog functions and some custom functions through Hive dialect, which greatly improves the compatibility of Flink SQL and Hive, making it more convenient for users who are familiar with Hive.

2. FLIP-163: Improve SQL Client

Before version 1.13, everyone felt that Flink SQL Client was a small tool around it. However, FLIP-163 has made important improvements in version 1.13:

Through the -i parameter, the DDL is loaded and initialized in advance, which is convenient for initializing multiple DDL statements of the table, and does not need to execute the command multiple times to create the table, instead of using the yaml file to create the table;
Support -f parameter, among which SQL file supports DML (insert into) statement;
Support more practical configurations:
- Pass SET SQL-client.verbose = true , turn on verbose, and print the entire message by turning on verbose, which makes it easier to trace error messages compared to outputting only one sentence before;
- SET execution.runtime-mode=streaming / batch supports setting batch/streaming job mode;
- Set the job name through SET pipline.name=my_Flink_job
- Set the job savepoint path through SET execution.savepoint.path=/tmp/Flink-savepoints/savepoint-bb0dab
- For multiple jobs that have dependencies, use SET Table.dml-sync=true to choose whether to execute asynchronously, such as offline jobs. Job a can only be run after job a is run. Job b can be executed by setting it to true to implement dependent pipeline scheduling .
Also supports STATEMENT SET syntax:
It is possible that one of our queries is not only written to one sink, but needs to be output to multiple sinks, for example, one sink is written to jdbc, and one sink is written to HBase.
- Before 1.13 version, two queries need to be started to complete this job;
- In version 1.13, we can put these into a statement and execute them as a job, which can realize the reuse of nodes and save resources.

3. FLIP-136: Enhance the conversion between DataStream and Table

Although Flink SQL greatly reduces some of the thresholds for us to use real-time computing, the advanced encapsulation of Table/SQL also shields some low-level implementations, such as timer, state, etc. Many advanced users want to be able to directly manipulate DataStream for more flexibility, which requires conversion between Table and DataStream. FLIP-136 enhances the conversion between Table and DataStream, making it easier for users to switch between the two.

Support the transfer of EVENT TIME and WATERMARK when converting between DataStream and Table;

Table Table = TableEnv.fromDataStream(
    dataStream,
  Schema.newBuilder()
  .columnByMetadata("rowtime","TIMESTMP(3)")
  .watermark("rowtime","SOURCE_WATERMARK()")
  .build());
)

Support Changelog data stream to convert between Table and DataStream.

//DATASTREAM 转 Table
StreamTableEnvironment.fromChangelogStream(DataStream<ROW>): Table
StreamTableEnvironment.fromChangelogStream(DataStream<ROW>,Schema): Table
//Table 转 DATASTREAM
StreamTableEnvironment.toChangelogStream(Table): DataStream<ROW>
StreamTableEnvironment.toChangelogStream(Table,Schema): DataStream<ROW>

Four, Flink SQL 1.14 future planning

The 1.14 version mainly has the following plans:

Delete Legacy Planner : Starting from Flink 1.9, after Ali contributed Blink-Planner, many new features have been developed based on this Blink Planner, and the old Legacy Planner will be completely deleted;
improve Window TVF : support session window, support allow-lateness of window TVF, etc.;
Improve Schema Handling : Schema processing capability of the whole link and improvement of key verification;
Enhanced Flink CDC support : Enhance the ability to integrate upstream CDC systems, and more operators in Flink SQL support CDC data streams.

Five, summary

This article explains in detail the core functions and important improvements of Flink SQL 1.13.

Support Window TVF;
Solve the problem of time zone and time function systematically;
Improve the compatibility of Hive and Flink;
Improve SQL Client;
Enhance the conversion of DataStream and Table.

At the same time, I also shared the community’s future plans for Flink SQL 1.14. I believe that students who have read the article will have a better understanding of the changes in Flink SQL in this version. In the course of practice, you can pay more attention to these new changes and changes. Feel the convenience at the business level that they bring.

In-depth interpretation of Flink SQL 1.13

1. Overview of Flink SQL 1.13

2. Interpretation of core features

1. FLIP-145: Support Window TVF

1.1 Window TVF Syntax

1.2 Cumulate Window

1.3 Window performance optimization

1.4 Multidimensional data analysis

2. FLIP-162: Time zone and time function

2.1 Analysis of time zone issues

2.2 Time function correction

2.3 Use of time types

2.4 Daylight saving time support

3. Interpretation of important improvements

1. FLIP-152: Improve Hive syntax compatibility

2. FLIP-163: Improve SQL Client

3. FLIP-136: Enhance the conversion between DataStream and Table

Four, Flink SQL 1.14 future planning

Five, summary

ApacheFlink

引用和评论

Flink CDC 3.4 发布, 优化高频 DDL 处理，支持 Batch 模式，新增 Iceberg 支持

基于 Flink CDC YAML 的 MySQL 到 Kafka 流式数据集成

小米基于 Apache Paimon 的流式湖仓实践

物化视图详解：数据库性能优化的利器

基于Flink的配置化实时反作弊系统

Apache Flink 2.0.0: 实时数据处理的新纪元

vivo基于Paimon的湖仓一体落地实践