1

I wrote an article about canal before, "mysql incremental synchronization - canal" . In the part of data synchronization, it mainly demonstrates the synchronization in the code by canal.client . canal adapter was also mentioned at the time, but not detailed. Recently, I have been in contact with the development of the elasticsearch project more and more. I tried to make a Demo during the holiday and took notes by the way.

1. Introduction to canal

1.1. Introduction to the Three Brothers

For the download and installation tutorials of the corresponding package of canal, you can directly see the official github of canal . The installation package currently has three brothers:

  • canal deployer : Also known as canal server, it is the server that actually monitors the mysql log.
  • canal adapter : As the name implies, "adapter", with canal server, can currently achieve incremental synchronization of mysql data to hbase, rdb, and es, a proper ETL tool.
  • canal admin : It also serves canal server. It provides canal with overall configuration management, node operation and maintenance and other operation and maintenance-oriented functions, and provides a relatively friendly WebUI operation interface. If the canal server is to build a cluster environment, a professional operation and maintenance tool such as canal admin is indispensable.

For those who don't visit github very much, paste the documentation as well:

1.2. canal adapter

Since it is an adapter, we have to introduce the connection between the two parts of "source" and "target":

  • Source : (1) the canal adapter can directly connect to the canal server and consume the data of the instance; (2) you can also let the canal server deliver the data to the MQ, and then the cancal adapter consumes the data in the MQ.
  • Goal : Currently supports hbase, rdb, es, and will support mongodb, redis, etc. in the future.

The implementation of this paper is relatively simple, and the data flow includes: mysql -> canal server -> canal adapter -> es .

2. Data Preparation

2.1. mysql table creation

See previous articles in the section on enabling binlog logging. Prepare two tables:

 -- 员工表
CREATE TABLE `hr_user` (
  `id` char(32) NOT NULL COMMENT '主键',
  `username` varchar(50) DEFAULT NULL COMMENT '账号',
  `fullname` varchar(50) DEFAULT NULL COMMENT '姓名',
  `sex` tinyint DEFAULT NULL COMMENT '性别 0-男/1-女',
  `birthday` date DEFAULT NULL COMMENT '生日',
  `dept_id` char(32) DEFAULT NULL COMMENT '所属部门ID',
  `deleted` tinyint DEFAULT NULL COMMENT '是否已删除 0-否/1-是',
  `created_by` char(32) DEFAULT NULL COMMENT '创建人ID',
  `create_time` datetime DEFAULT NULL COMMENT '创建时间',
  `updated_by` char(32) DEFAULT NULL COMMENT '更新人ID',
  `updated_time` datetime DEFAULT NULL COMMENT '更新时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- 部门表
CREATE TABLE `hr_dept` (
  `id` char(32) NOT NULL COMMENT '主键',
  `dept_name` varchar(50) DEFAULT NULL COMMENT '部门名称',
  `manager_name` varchar(50) DEFAULT NULL COMMENT '部门经理姓名',
  `parent_id` char(32) DEFAULT NULL COMMENT '父级部门ID',
  `dept_path` varchar(1000) DEFAULT NULL COMMENT '部门路径',
  `deleted` tinyint DEFAULT NULL COMMENT '是否已删除 0-否/1-是',
  `created_by` char(32) DEFAULT NULL COMMENT '创建人ID',
  `create_time` datetime DEFAULT NULL COMMENT '创建时间',
  `updated_by` char(32) DEFAULT NULL COMMENT '更新人ID',
  `updated_time` datetime DEFAULT NULL COMMENT '更新时间',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2.2. Install es, kibana

es docker shell

 docker run -d \
 --name elasticsearch \
 --restart=on-failure:3 \
 -p 9200:9200 \
 -p 9300:9300 \
 -e "discovery.type=single-node" \
 -v /Volumes/elasticsearch/data/:/usr/share/elasticsearch/data/ \
 -v /Volumes/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
 -v /Volumes/elasticsearch/plugins/:/usr/share/elasticsearch/plugins/ \
 elasticsearch:7.9.3

kibana docker shell

 docker run -d \
 --name kibana \
 --link elasticsearch:es \
 -p 5601:5601 \
 -e ELASTICSEARCH_URL=es:9200 \
 kibana:7.9.3

2.3. Create an index

Create an index user in kibana -> Management -> Dev Tools :

PUT user

 {
    "mappings":{
        "properties":{
            "birthday":{
                "type":"date",
                "format":"yyyy-MM-dd"
            },
            "dept_id":{
                "type":"keyword"
            },
            "dept_name":{
                "type":"text",
                "analyzer":"ik_max_word"
            },
            "dept_updated_time":{
                "type":"date"
            },
            "fullname":{
                "type":"text",
                "analyzer":"ik_max_word"
            },
            "sex":{
                "type":"byte"
            },
            "user_id":{
                "type":"keyword"
            },
            "user_updated_time":{
                "type":"date"
            },
            "username":{
                "type":"text"
            }
        }
    }
}

3. canal configuration

The latest release version is 1.1.6-alpha-1 , here only download the two compressed packages of canal-deployer and canal-adapter of this version, and decompress them locally to a corresponding directory. canal-admin will not be installed, the first two are enough for the time being.

3.1. canal server

The installation configuration of canal server is actually in the article "mysql incremental synchronization - canal" , so let's just list it.

Because you don't do canal server to deliver data to MQ, you can pay attention to the following parameters in conf/example/instance.properties:

 canal.instance.master.address = 127.0.0.1:3306 
canal.instance.dbUsername = canal  
canal.instance.dbPassword = canal

The default startup in the configuration is the instance called example, so the startup script and the viewing log correspond to the following:

 # 启动
sh bin/startup.sh
# 关闭
sh bin/stop.sh
# 查看具体实例日志
tail -500f logs/example/example.log

3.2. canal adapter

For the configuration of all adapters, refer to the wiki of adapter synchronization es

1. Modify conf/application.yml
 server:
  port: 8081
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp #tcp kafka rocketMQ rabbitMQ
  flatMessage: true
  zookeeperHosts:
  syncBatchSize: 1000
  retries: 0
  timeout:
  accessKey:
  secretKey:
  consumerProperties:
    # canal tcp consumer
    canal.tcp.server.host: 127.0.0.1:11111
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:
  srcDataSources:
    defaultDS:
      url: jdbc:mysql://127.0.0.1:3306/es?useUnicode=true
      username: canal
      password: canal
  canalAdapters:
  - instance: example
    groups:
    - groupId: g1
      outerAdapters:
      - name: es7
        hosts: http://127.0.0.1:9200
        properties:
          mode: rest
          cluster.name: docker-cluster

Because it is directly connected to the canal server, so mode: tcp , no other mq is selected. The other is to configure the connection information of canal server, mysql, es.

2. Add conf/es7/user.yml

Because mysql needs to synchronize the user index in es, add a user.yml file in es7, because the adapter loading path es7 is configured in the previous conf/application.yml, so this directory will be loaded by default All yml files below.

user.yml

 dataSourceKey: defaultDS
destination: example
groupId: g1
esMapping:
  _index: user
  _id: user_id

  sql: "SELECT 
    u.id AS user_id,
    u.username,
    u.fullname,
    u.sex,
    u.birthday,
    u.dept_id,
    d.dept_name,
    u.updated_time as user_updated_time,
    d.updated_time as dept_updated_time
FROM
    hr_user u
        LEFT JOIN
    hr_dept d ON u.dept_id = d.id"

  etlCondition: "where u.deleted = 0 AND d.deleted = 0"
  commitBatch: 3000

It is relatively clear at a glance, the data source sql to synchronize the user index in es is configured. However, there are certain specification requirements. For specific specification requirements, please refer to the official wiki document issued earlier.

3. Start
 # 启动
sh bin/startup.sh
# 关闭
sh bin/stop.sh
# 查看适配器日志
tail -500f logs/adapter/adapter.log

If you can see that the logs of the canal server and canal adapter do not report any error messages, that's fine.

4. Verification

In order to view the data in es conveniently, add the user index to Discover in kibana. In Kibana -> Management -> Stack Management -> Kibana -> Index patterns -> Create index pattern , add the user index. Then go back to Discover to see the data in the corresponding index.

Add a new piece of data to the corresponding table in mysql:

 -- hr_dept
INSERT INTO hr_dept (id,dept_name,manager_name,parent_id,dept_path,deleted,created_by,create_time,updated_by,updated_time) 
VALUES ('9ef57211ca3311ec8fe00242ac110004','中台研发部','罗永浩','66ab59dbcabf11ec8fe00242ac110004','研发中心>平台架构部>TPaaS研发部',0,NULL,now(),NULL,now());

-- hr_user
INSERT INTO hr_user (id,username,fullname,sex,birthday,dept_id,deleted,created_by,create_time,updated_by,updated_time) 
VALUES ('b7205315cac811ec8fe00242ac110004','zhangsan','张三',0,'1995-02-18','9ef57211ca3311ec8fe00242ac110004',0,NULL,now(),NULL,now());

In kibana, you can see that a new piece of data has been added to the corresponding es index, and the corresponding log can also be seen in the canal adapter adapter.log log.

Then whether we modify the hr_user table alone or just modify the dept_name field in the hr_dept table, the corresponding document in es will also be modified accordingly.

image.png


KerryWu
641 声望159 粉丝

保持饥饿