Abstract: CDL is a simple and efficient real-time data integration service. It can grab Data Change events from various OLTP databases and push them to Kafka. Finally, Sink Connector consumes the data in Topic and imports it into big data. In the application of ecological software, real-time data entry into the lake can be realized.
This article is shared from the Huawei Cloud Community " Huawei FusionInsight MRS CDL User Guide ", author: Jin Hongqing.
illustrate
CDL is a simple and efficient real-time data integration service. It can grab Data Change events from various OLTP databases and push them to Kafka. Finally, the data in Topic is consumed by Sink Connector and imported into big data ecological software applications. , So as to realize the real-time data entry into the lake.
The CDL service contains two important roles: CDLConnector and CDLService. CDLConnector is an instance of concretely executing data capture tasks, and CDLService is an instance responsible for managing and creating tasks.
This practice introduces data capture using mysql as a data source
Prerequisites
- The CDL service has been installed in the MRS cluster.
- The MySQL database needs to enable the bin log function of mysql (it is enabled by default).
Check whether MySQL has enabled bin log:
Use a tool or command line to connect to the MySQL database (in this example, use the Navicat tool to connect), and execute the show variables like'log_%' command to view.
For example, in the Navicat tool, select "File> New Query" to create a new query, enter the following SQL command, and click "Run". In the result, "log_bin" is displayed as "ON", which means the startup is successful.
show variables like 'log_%'
Tool preparation
Now cdl can only use rest api to submit commands, so you need to install tools in advance for debugging. This article uses the VSCode tool.
Install the rest client plug-in after completion:
After completion, create a cdl.http file for editing:
Create CDL task
The flowchart of CDL task creation is as follows:
Note: You need to create a MySQL link first, create a Kafka link, and then create a CDL synchronization task and start it.
MySQL link partial rest request code
@hostname = 172.16.9.113
@port = 21495
@host = {{hostname}}:{{port}}
@bootstrap = "172.16.9.113:21007"
@bootstrap_normal = "172.16.9.113:21005"
@mysql_host = "172.16.2.118"
@mysql_port = "3306"
@mysql_database = "hudi"
@mysql_user = "root"
@mysql_password = "Huawei@123"
### get links
get https://{{host}}/api/v1/cdl/link
### mysql link validate
post https://{{host}}/api/v1/cdl/link?validate=true
content-type: application/json
{
"name": "MySQL_link", //link名,全局唯一,不能重复
"description":"MySQL connection", //link描述
"link-type":"mysql", //link的类型
"enabled":"true",
"link-config-values": {
"inputs": [
{ "name": "host", "value": {{mysql_host}} }, //数据库安装节点的ip
{ "name": "port", "value": {{mysql_port}} },//数据库监听的端口
{ "name": "database.name", "value": {{mysql_database}} }, //连接的数据库名
{ "name": "user", "value": {{mysql_user}} }, //用户
{ "name": "password","value": {{mysql_password}} } ,//密码
{ "name":"schema", "value": {{mysql_database}}}//同数据库名
]
}
}
### mysql link create
post https://{{host}}/api/v1/cdl/link
content-type: application/json
{
"name": "MySQL_link", //link名,全局唯一,不能重复
"description":"MySQL connection", //link描述
"link-type":"mysql", //link的类型
"enabled":"true",
"link-config-values": {
"inputs": [
{ "name": "host", "value": {{mysql_host}} }, //数据库安装节点的ip
{ "name": "port", "value": {{mysql_port}} },//数据库监听的端口
{ "name": "database.name", "value": {{mysql_database}} }, //连接的数据库名
{ "name": "user", "value": {{mysql_user}} }, //用户
{ "name": "password","value": {{mysql_password}} } ,//密码
{ "name":"schema", "value": {{mysql_database}}}//同数据库名
]
}
}
### mysql link update
put https://{{host}}/api/v1/cdl/link/MySQL_link
content-type: application/json
{
"name": "MySQL_link", //link名,全局唯一,不能重复
"description":"MySQL connection", //link描述
"link-type":"mysql", //link的类型
"enabled":"true",
"link-config-values": {
"inputs": [
{ "name": "host", "value": {{mysql_host}} }, //数据库安装节点的ip
{ "name": "port", "value": {{mysql_port}} },//数据库监听的端口
{ "name": "database.name", "value": {{mysql_database}} }, //连接的数据库名
{ "name": "user", "value": {{mysql_user}} }, //用户
{ "name": "password","value": {{mysql_password}} } ,//密码
{ "name":"schema", "value": {{mysql_database}}}//同数据库名
]
}
}
Kafka link partial rest request code
### get links
get https://{{host}}/api/v1/cdl/link
### kafka link validate
post https://{{host}}/api/v1/cdl/link?validate=true
content-type: application/json
{
"name": "kafka_link",
"description":"test kafka link",
"link-type":"kafka",
"enabled":"true",
"link-config-values": {
"inputs": [
{ "name": "bootstrap.servers", "value": "172.16.9.113:21007" },
{ "name": "sasl.kerberos.service.name", "value": "kafka" },
{ "name": "security.protocol","value": "SASL_PLAINTEXT" }//安全模式为SASL_PLAINTEXT,普通模式为PLAINTEXT
]
}
}
### kafka link create
post https://{{host}}/api/v1/cdl/link
content-type: application/json
{
"name": "kafka_link",
"description":"test kafka link",
"link-type":"kafka",
"enabled":"true",
"link-config-values": {
"inputs": [
{ "name": "bootstrap.servers", "value": "172.16.9.113:21007" },
{ "name": "sasl.kerberos.service.name", "value": "kafka" },
{ "name": "security.protocol","value": "SASL_PLAINTEXT" }//安全模式为SASL_PLAINTEXT,普通模式为PLAINTEXT
]
}
}
### kafka link update
put https://{{host}}/api/v1/cdl/link/kafka_link
content-type: application/json
{
"name": "kafka_link",
"description":"test kafka link",
"link-type":"kafka",
"enabled":"true",
"link-config-values": {
"inputs": [
{ "name": "bootstrap.servers", "value": "172.16.9.113:21007" },
{ "name": "sasl.kerberos.service.name", "value": "kafka" },
{ "name": "security.protocol","value": "SASL_PLAINTEXT" }//安全模式为SASL_PLAINTEXT,普通模式为PLAINTEXT
]
}
}
CDL task command part rest request code
@hostname = 172.16.9.113
@port = 21495
@host = {{hostname}}:{{port}}
@bootstrap = "172.16.9.113:21007"
@bootstrap_normal = "172.16.9.113:21005"
@mysql_host = "172.16.2.118"
@mysql_port = "3306"
@mysql_database = "hudi"
@mysql_user = "root"
@mysql_password = "Huawei@123"
### create job
post https://{{host}}/api/v1/cdl/job
content-type: application/json
{
"job_type": "CDL_JOB", //job类型,目前只支持CDL_JOB这一种
"name": "mysql_to_kafka", //job名称
"description":"mysql_to_kafka", //job描述
"from-link-name": "MySQL_link", //数据源Link
"to-link-name": "kafka_link", //目标源Link
"from-config-values": {
"inputs": [
{"name" : "connector.class", "value" : "com.huawei.cdc.connect.mysql.MysqlSourceConnector"},
{"name" : "schema", "value" : "hudi"},
{"name" : "db.name.alias", "value" : "hudi"},
{"name" : "whitelist", "value" : "hudisource"},
{"name" : "tables", "value" : "hudisource"},
{"name" : "tasks.max", "value" : "10"},
{"name" : "mode", "value" : "insert,update,delete"},
{"name" : "parse.dml.data", "value" : "true"},
{"name" : "schema.auto.creation", "value" : "false"},
{"name" : "errors.tolerance", "value" : "all"},
{"name" : "multiple.topic.partitions.enable", "value" : "false"},
{"name" : "topic.table.mapping", "value" : "[
{\"topicName\":\"huditableout\", \"tableName\":\"hudisource\"}
]"
},
{"name" : "producer.override.security.protocol", "value" : "SASL_PLAINTEXT"},//安全模式为SASL_PLAINTEXT,普通模式为PLAINTEXT
{"name" : "consumer.override.security.protocol", "value" : "SASL_PLAINTEXT"}//安全模式为SASL_PLAINTEXT,普通模式为PLAINTEXT
]
},
"to-config-values": {"inputs": []},
"job-config-values": {
"inputs": [
{"name" : "global.topic", "value" : "demo"}
]
}
}
### get all job
get https://{{host}}/api/v1/cdl/job
### submit job
put https://{{host}}/api/v1/cdl/job/mysql_to_kafka/start
### get job status
get https://{{host}}/api/v1/cdl/submissions?jobName=mysql_to_kafka
### stop job
put https://{{host}}/api/v1/cdl/job/mysql_to_kafka/submissions/13/stop
### delete job
DELETE https://{{host}}/api/v1/cdl/job/mysql_to_kafka
Scene verification
The raw data of the production database MySQL is as follows:
After submitting the CDL task
Add operation: insert into hudi.hudisource values (11,"蒋语堂",38,"女","图","PLAYER",28732);
Corresponding to the kafka message body:
Change operation: UPDATE hudi.hudisource SET uname='Anne Marie333' WHERE uid=11;
Corresponding to the kafka message body:
Delete operation: delete from hudi.hudisource where uid=11;
Corresponding to the kafka message body:
Click to follow and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。