I wrote an article about canal before, "mysql incremental synchronization - canal" . In the part of data synchronization, it mainly demonstrates the synchronization in the code by canal.client
. canal adapter
was also mentioned at the time, but not detailed. Recently, I have been in contact with the development of the elasticsearch project more and more. I tried to make a Demo during the holiday and took notes by the way.
1. Introduction to canal
1.1. Introduction to the Three Brothers
For the download and installation tutorials of the corresponding package of canal, you can directly see the official github of canal . The installation package currently has three brothers:
- canal deployer : Also known as canal server, it is the server that actually monitors the mysql log.
- canal adapter : As the name implies, "adapter", with canal server, can currently achieve incremental synchronization of mysql data to hbase, rdb, and es, a proper ETL tool.
- canal admin : It also serves canal server. It provides canal with overall configuration management, node operation and maintenance and other operation and maintenance-oriented functions, and provides a relatively friendly WebUI operation interface. If the canal server is to build a cluster environment, a professional operation and maintenance tool such as canal admin is indispensable.
For those who don't visit github very much, paste the documentation as well:
1.2. canal adapter
Since it is an adapter, we have to introduce the connection between the two parts of "source" and "target":
- Source : (1) the canal adapter can directly connect to the canal server and consume the data of the instance; (2) you can also let the canal server deliver the data to the MQ, and then the cancal adapter consumes the data in the MQ.
- Goal : Currently supports hbase, rdb, es, and will support mongodb, redis, etc. in the future.
The implementation of this paper is relatively simple, and the data flow includes: mysql -> canal server -> canal adapter -> es
.
2. Data Preparation
2.1. mysql table creation
See previous articles in the section on enabling binlog logging. Prepare two tables:
-- 员工表
CREATE TABLE `hr_user` (
`id` char(32) NOT NULL COMMENT '主键',
`username` varchar(50) DEFAULT NULL COMMENT '账号',
`fullname` varchar(50) DEFAULT NULL COMMENT '姓名',
`sex` tinyint DEFAULT NULL COMMENT '性别 0-男/1-女',
`birthday` date DEFAULT NULL COMMENT '生日',
`dept_id` char(32) DEFAULT NULL COMMENT '所属部门ID',
`deleted` tinyint DEFAULT NULL COMMENT '是否已删除 0-否/1-是',
`created_by` char(32) DEFAULT NULL COMMENT '创建人ID',
`create_time` datetime DEFAULT NULL COMMENT '创建时间',
`updated_by` char(32) DEFAULT NULL COMMENT '更新人ID',
`updated_time` datetime DEFAULT NULL COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- 部门表
CREATE TABLE `hr_dept` (
`id` char(32) NOT NULL COMMENT '主键',
`dept_name` varchar(50) DEFAULT NULL COMMENT '部门名称',
`manager_name` varchar(50) DEFAULT NULL COMMENT '部门经理姓名',
`parent_id` char(32) DEFAULT NULL COMMENT '父级部门ID',
`dept_path` varchar(1000) DEFAULT NULL COMMENT '部门路径',
`deleted` tinyint DEFAULT NULL COMMENT '是否已删除 0-否/1-是',
`created_by` char(32) DEFAULT NULL COMMENT '创建人ID',
`create_time` datetime DEFAULT NULL COMMENT '创建时间',
`updated_by` char(32) DEFAULT NULL COMMENT '更新人ID',
`updated_time` datetime DEFAULT NULL COMMENT '更新时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
2.2. Install es, kibana
es docker shell
docker run -d \
--name elasticsearch \
--restart=on-failure:3 \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-v /Volumes/elasticsearch/data/:/usr/share/elasticsearch/data/ \
-v /Volumes/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /Volumes/elasticsearch/plugins/:/usr/share/elasticsearch/plugins/ \
elasticsearch:7.9.3
kibana docker shell
docker run -d \
--name kibana \
--link elasticsearch:es \
-p 5601:5601 \
-e ELASTICSEARCH_URL=es:9200 \
kibana:7.9.3
2.3. Create an index
Create an index user in kibana -> Management -> Dev Tools :
PUT user
{
"mappings":{
"properties":{
"birthday":{
"type":"date",
"format":"yyyy-MM-dd"
},
"dept_id":{
"type":"keyword"
},
"dept_name":{
"type":"text",
"analyzer":"ik_max_word"
},
"dept_updated_time":{
"type":"date"
},
"fullname":{
"type":"text",
"analyzer":"ik_max_word"
},
"sex":{
"type":"byte"
},
"user_id":{
"type":"keyword"
},
"user_updated_time":{
"type":"date"
},
"username":{
"type":"text"
}
}
}
}
3. canal configuration
The latest release version is 1.1.6-alpha-1
, here only download the two compressed packages of canal-deployer and canal-adapter of this version, and decompress them locally to a corresponding directory. canal-admin will not be installed, the first two are enough for the time being.
3.1. canal server
The installation configuration of canal server is actually in the article "mysql incremental synchronization - canal" , so let's just list it.
Because you don't do canal server to deliver data to MQ, you can pay attention to the following parameters in conf/example/instance.properties:
canal.instance.master.address = 127.0.0.1:3306
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal
The default startup in the configuration is the instance called example, so the startup script and the viewing log correspond to the following:
# 启动
sh bin/startup.sh
# 关闭
sh bin/stop.sh
# 查看具体实例日志
tail -500f logs/example/example.log
3.2. canal adapter
For the configuration of all adapters, refer to the wiki of adapter synchronization es
1. Modify conf/application.yml
server:
port: 8081
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
default-property-inclusion: non_null
canal.conf:
mode: tcp #tcp kafka rocketMQ rabbitMQ
flatMessage: true
zookeeperHosts:
syncBatchSize: 1000
retries: 0
timeout:
accessKey:
secretKey:
consumerProperties:
# canal tcp consumer
canal.tcp.server.host: 127.0.0.1:11111
canal.tcp.zookeeper.hosts:
canal.tcp.batch.size: 500
canal.tcp.username:
canal.tcp.password:
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/es?useUnicode=true
username: canal
password: canal
canalAdapters:
- instance: example
groups:
- groupId: g1
outerAdapters:
- name: es7
hosts: http://127.0.0.1:9200
properties:
mode: rest
cluster.name: docker-cluster
Because it is directly connected to the canal server, so mode: tcp
, no other mq is selected. The other is to configure the connection information of canal server, mysql, es.
2. Add conf/es7/user.yml
Because mysql needs to synchronize the user index in es, add a user.yml file in es7, because the adapter loading path es7
is configured in the previous conf/application.yml, so this directory will be loaded by default All yml files below.
user.yml
dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
_index: user
_id: user_id
sql: "SELECT
u.id AS user_id,
u.username,
u.fullname,
u.sex,
u.birthday,
u.dept_id,
d.dept_name,
u.updated_time as user_updated_time,
d.updated_time as dept_updated_time
FROM
hr_user u
LEFT JOIN
hr_dept d ON u.dept_id = d.id"
etlCondition: "where u.deleted = 0 AND d.deleted = 0"
commitBatch: 3000
It is relatively clear at a glance, the data source sql to synchronize the user index in es is configured. However, there are certain specification requirements. For specific specification requirements, please refer to the official wiki document issued earlier.
3. Start
# 启动
sh bin/startup.sh
# 关闭
sh bin/stop.sh
# 查看适配器日志
tail -500f logs/adapter/adapter.log
If you can see that the logs of the canal server and canal adapter do not report any error messages, that's fine.
4. Verification
In order to view the data in es conveniently, add the user index to Discover
in kibana. In Kibana -> Management -> Stack Management -> Kibana -> Index patterns -> Create index pattern
, add the user index. Then go back to Discover
to see the data in the corresponding index.
Add a new piece of data to the corresponding table in mysql:
-- hr_dept
INSERT INTO hr_dept (id,dept_name,manager_name,parent_id,dept_path,deleted,created_by,create_time,updated_by,updated_time)
VALUES ('9ef57211ca3311ec8fe00242ac110004','中台研发部','罗永浩','66ab59dbcabf11ec8fe00242ac110004','研发中心>平台架构部>TPaaS研发部',0,NULL,now(),NULL,now());
-- hr_user
INSERT INTO hr_user (id,username,fullname,sex,birthday,dept_id,deleted,created_by,create_time,updated_by,updated_time)
VALUES ('b7205315cac811ec8fe00242ac110004','zhangsan','张三',0,'1995-02-18','9ef57211ca3311ec8fe00242ac110004',0,NULL,now(),NULL,now());
In kibana, you can see that a new piece of data has been added to the corresponding es index, and the corresponding log can also be seen in the canal adapter adapter.log log.
Then whether we modify the hr_user table alone or just modify the dept_name
field in the hr_dept table, the corresponding document in es will also be modified accordingly.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。