Preface: What is Flink CDC?
Flink CDC is an open source project using the Apache License 2.0 protocol. It supports real-time reading of stock historical data and data from databases such as MySQL, MariaDB, RDS MySQL, Aurora MySQL, PolarDB MySQL, PostgreSQL, Oracle, MongoDB, SqlServer, TiDB, OceanBase, etc. Incremental changes to data, the entire process provides exactly-once semantic guarantees. Flink CDC also provides two sets of APIs, SQL API and DataStream API, which well meet the needs of different developers.
As a new-generation data integration framework, Flink CDC can not only replace traditional DataX and Canal tools for real-time data synchronization, and integrate the full and incremental data of the database into message queues and data warehouses; it can also perform real-time data integration, The database data is put into the lake and warehouse in real time; at the same time, it also supports powerful data processing capabilities. It can perform real-time association, widening, and aggregation of database data through SQL, and write the materialized results into various storages. Compared with other data integration frameworks, Flink CDC has technical advantages such as full-incremental integration, lock-free reading, concurrent reading, and distributed architecture, and is very popular in the open source community.
Flink CDC project address:
https://github.com/ververica/flink-cdc-connectors
1. GitHub star exceeds 2000
Since its open source in July 2020, the Flink CDC community has developed rapidly, and its attention on GitHub has continued to rise. Looking back at the development of the Flink CDC project, at the beginning of September 2021, the GitHub star of the Flink CDC project exceeded 1000 for the first time. It was also at this time that Flink CDC released version 2.0 and officially entered the stage of large-scale production availability. The development speed of the community also seems to be installed Speed up the engine.
More and more people know and start to use Flink CDC, and many developers participate in the contribution of Flink CDC. In the past six months alone, the number of GitHub stars for the Flink CDC project has doubled. As of press time, the number of GitHub stars of the Flink CDC project has reached 2015, the number of forks has reached 660, and the number of issues has reached 582. That's the power of open source!
The development of the community is inseparable from the contributions of all community developers and the support of users. At present, the number of contributors to the Flink CDC project has increased to 34, and the contributors are from Cloudera, RedHat, Vinted, Alibaba, Ant, NetEase, XTransfer, etc. at home and abroad company. The Flink CDC community user group has also developed very rapidly, and the Chinese user group has reached 3800+ people more than half a year after its establishment.
<img src="https://img.alicdn.com/imgextra/i4/O1CN0169PKeU25TPvj4EYsR_!!6000000007527-2-tps-828-1068.png" alt="img" style="zoom:33%;" />
According to community user groups and public data statistics, companies currently using Flink CDC include Cloudera, Vinted, Alibaba, Ant, NetEase, Tencent, Bilibili, XTransfer, 37 Mobile Games, Agricultural Bank, Minsheng Bank, Shenzhen Lingxing Network, Dajian cloud warehouse and other domestic and foreign cloud manufacturers and well-known enterprises. Through the stream computing services provided by these cloud vendors and the practice of many enterprises, we found that more and more users are using Flink CDC to quickly realize real-time data integration and real-time construction of data lakes.
2. Add Maintainer members
The rapid development of the Flink CDC community is inseparable from the efforts of contributors. During the rapid development of the Flink CDC community, a group of active and high-quality contributors have emerged. After the Flink CDC community Maintainer group discussion, the Flink CDC community has invited Jiabao-Sun (Sun Jiabao) to join the Flink CDC community Maintainer list.
Mr. Sun Jiabao is a senior Java development engineer in the XTransfer Infrastructure Department, responsible for the XTransfer infrastructure and big data platform construction. He has been active in the Flink CDC community for a long time, and as a core contributor, he contributed several PRs including the MongoDB CDC Connector in the community , and is very active in the community issue list and the Flink CDC community group, helping community developers and users answer a lot of questions, and making great contributions to community development.
It is expected that Mr. Sun Jiabao, as the Maintainer of the Flink CDC project, will bring more diverse perspectives to the development of Flink CDC and help more community contributors and users. I also hope that more contributors can join the Maintainer list in the future to continuously promote the development of the community.
3. Prospect of Flink CDC 2.2 Version
After 3 months of development by the community, 47 commits have been merged, and the Flink CDC 2.2 version is about to meet with you, including many long-awaited features.
- Version 2.2 will add three connectors: SqlServer CDC, TiDB CDC, and OceanBase CDC, which support reading full and incremental CDC data from the above three databases.
- MySQL CDC supports dynamic table addition. If you monitor 4 tables in a CDC pipeline, and suddenly one day the boss wants you to add a few tables, you definitely don’t want to start another job (wasting resources), then this feature allows you to Add the table to be monitored in the pipeline without re-reading the synchronized table.
- All CDC Connectors are compatible with Flink 1.13 and Flink 1.14 versions, which means that the same Connector can run on clusters of different versions.
- The incremental snapshot reading algorithm is abstracted into a general framework, which is convenient for other connectors to access. With only a small amount of additional code, the new connector can support functions such as lock-free reading, multi-concurrent reading, and resuming the whole process.
- MongoDB CDC supports regular expressions to filter collections. In version 2.1, MongoDB CDC can only capture a single collection or all collections under the DB. Version 2.2 will provide support for regular matching collections.
- MySQL CDC will support MySQL 5.6, which is definitely good news for users of earlier versions of MySQL.
- In addition, the 2.2 version also fixes many user feedback bugs and small improvements.
Contributors in the community are intensively preparing for the release of version 2.2. It is currently expected to meet users in mid-to-late March. Interested partners are also welcome to become contributors to Flink CDC and participate in design, R&D and testing together to jointly promote Community Development!
For more technical issues related to Flink, you can scan the code to join the community DingTalk exchange group
Get the latest technical articles and community dynamics for the first time, please pay attention to the public number~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。