主要实现思路

  • 1、在clickhouse中创建MySQL引擎表。
  • 2、根据MySQL引擎表的信息创建目标表。
  • 3、实现canal实时增量同步MySQL数据到clickhouse。

    MySQL 的准备

    修改配置文件开启 Binlog

    [root@hadoop100 module]$ sudo vim /etc/my.cnf
    server-id=1
    log-bin=mysql-bin
    binlog_format=row
    binlog-do-db=test
  • 注意:binlog-do-db 根据自己的情况进行修改,指定具体要同步的数据库,如果不配置则表示所有数据库均开启 Binlog。

    MySQL中需要同步的库表

    MySQL [testck]> show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | innodb             |
    | mysql              |
    | performance_schema |
    | sys                |
    | testck             |
    | tmp                |
  • rows in set (0.00 sec)
    MySQL [testck]> use testck;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    Database changed

    MySQL [testck]> show tables;
    Tables_in_testck
    t_organization
    t_user
  • rows in set (0.00 sec)
    MySQL [testck]>

  • testck库下的所有表

    clickhouse 的准备

  • canal实时同步MySQL的数据需要在clickhouse中提前建好库表

    在clickhouse建MySQL引擎的库

  • 把MySQL某个库中的所有表结构信息映射到clickhouse中,这样在clickhouse中就可以远程操作MySQL。

    cnnxpredn02 :) CREATE DATABASE t_tmp ENGINE = MySQL('ip:3306', 'testck', 'user', 'pass');
    CREATE DATABASE t_tmp
    ENGINE = MySQL('ip:3306', 'testck', 'user', 'pass')
    Query id: 9f6d7179-3c97-47c2-93ea-abc0c6ca873b
  • rows in set. Elapsed: 0.013 sec.
    cnnxpredn02 :) show databases;
    SHOW DATABASES
    Query id: 5b5baaf5-86f5-46c1-ac34-2618c34462f1
    ┌─name────┐
    │ default │
    │ system │
    │ t_tmp │
    └─────────┘
  • rows in set. Elapsed: 0.009 sec.
    cnnxpredn02 :)

    ## 创建clickhouse中的库

    cnnxpredn02 :) create database testck;
    CREATE DATABASE testck
    Query id: 397261c0-a8f0-48c6-b4f6-71d121b975b8
    Ok.

  • rows in set. Elapsed: 0.004 sec.
    cnnxvopredn02 :) show databases;
    SHOW DATABASES
    Query id: f467f4bb-99fa-4322-a3c1-e33be74b6e81
    ┌─name────┐
    │ default │
    │ system │
    │ t_tmp │
    │ testck │
    └─────────┘
  • rows in set. Elapsed: 0.002 sec.
    cnnxvpredn02 :)

    ## 根据创建的MySQL引擎表创建clickhouse目标表结构

    cnnxvpredn02 :) create table testck.t_organization as t_tmp.t_organization;
    :-] create table testck.t_user as t_tmp.t_user;
    CREATE TABLE testck.t_organization AS t_tmp.t_organization
    Query id: f942cf5d-701f-4dbd-9ffd-de2206eec851
    Ok.

  • rows in set. Elapsed: 0.006 sec.
    CREATE TABLE testck.t_user AS t_tmp.t_user
    Query id: ef7dddd7-64b2-4dbc-88be-a105b49aff57
    Ok.
  • rows in set. Elapsed: 0.004 sec.
    cnnxvopredn02 :)

    # Canal 的下载和安
  • 下载canal:https://github.com/alibaba/canal/releases
  • 注意:canal 解压后是分散的,我们在指定解压目录的时候需要将 canal 指定上

     mkdir /opt/module/canal
     tar -zxvf canal.deployer-1.1.2.tar.gz -C /opt/module/canal

    修改 canal.properties 的配置

    #################################################
    #########               common argument         #############
    #################################################
    # tcp bind ip
    canal.ip =
    # register ip to zookeeper
    canal.register.ip =
    canal.port = 11111
    canal.metrics.pull.port = 11112
    # canal instance user/passwd
    # canal.user = canal
    # canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458
    # canal admin config
    #canal.admin.manager = 127.0.0.1:8089
    canal.admin.port = 11110
    canal.admin.user = admin
    canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
    # admin auto register
    #canal.admin.register.auto = true
    #canal.admin.register.cluster =
    #canal.admin.register.name =
    canal.zkServers =
    # flush data to zk
    canal.zookeeper.flush.period = 1000
    canal.withoutNetty = false
    # tcp, kafka, rocketMQ, rabbitMQ
    canal.serverMode = tcp
    # flush meta cursor/parse position to file
    canal.file.data.dir = ${canal.conf.dir}
    canal.file.flush.period = 1000
    ## memory store RingBuffer size, should be Math.pow(2,n)
    ......
  • 说明:这个文件是canal的基本通用配置,canal端口号默认就是 11111,修改canal的输出model,默认tcp,改为输出到 kafka多实例配置如果创建多个实例,通过前面canal架构,我们可以知道,一个canal服务中可以有多个instance,conf/下的每一个example即是一个实例,每个实例下面都有独立的配置文件。默认只有一个实例example,如果需要多个实例处理不同的 MySQL 数据的话,直接拷贝出多个 example,并对其重新命名,命名和配置文件中指定的名称一致,然后修改
    canal.properties 中的 canal.destinations=实例 1,实例 2,实例 3。

    #################################################
    ######### destinations #############
    #################################################
    canal.destinations = example

    修改 instance.properties

    配置 MySQL 服务器地址

    #################################################
    ## mysql serverId , v1.0.26+ will autoGen 
    canal.instance.mysql.slaveId=20
    # enable gtid use true/false
    canal.instance.gtidon=false
    # position info
    canal.instance.master.address=hadoop100:3306

    配置连接 MySQL 的用户名和密码,默认就是我们前面授权的 canal

    # username/password
    canal.instance.dbUsername=canal
    canal.instance.dbPassword=canal
    canal.instance.connectionCharset = UTF-8
    canal.instance.defaultDatabaseName =test
    # enable druid Decrypt database password
    canal.instance.enableDruid=false

    启动服务

    bin/startup.sh
  • 如果看到如下日志则启动成功

    [root@cnnxpredn02 canal]# cat canal.log
    2022-03-07 08:47:21.349 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
    2022-03-07 08:47:21.398 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
    2022-03-07 08:47:21.415 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
    2022-03-07 08:47:21.475 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server):11111]
    2022-03-07 08:47:23.137 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......

    canal-adapter 的下载和安装

    下载并解

  • 下载canal-adapter: https://github.com/alibaba/canal/releases
  • 注意:canal-adapter解压后是分散的,我们在指定解压目录的时候需要将 canal-adapter 指定上
    mkdir canal-adapter
    tar -zxvf canal.adapter-1.1.5.tar.gz -C /../canal-adapter

    在application.yml中配置MySQL和clickhouse的连接信息

    ...
    #  srcDataSources:
    defaultDS:
      url: jdbc:mysql://ip:3306/database?useUnicode=true
      username: user
      password: pass
      canalAdapters:
      - instance: test # canal instance Name or mq topic name
    groups:
    - groupId: g1
      outerAdapters:
      - name: logger
      - name: rdb
        key: mysql1
        properties:
          jdbc.driverClassName: ru.yandex.clickhouse.ClickHouseDriver
          jdbc.url: jdbc:clickhouse://ip:port/database
          jdbc.username: user
          jdbc.password: pass
    #      - name: rdb
    ...

    在/opt/soft/canal-adapter/conf/rdb/mytest_user.yml中配置需要同步的库表信息

  • canal可以同步整个库的所有表和单表的表,配置都在这里进行配置,如果需要同步多张表的数据,就在rdb下创建多个类似的配置文件,文件名可以用表名来命名。

        Mirror schema synchronize config
    dataSourceKey: defaultDS
    destination: test
    groupId: g1
    outerAdapterKey: mysql1
    concurrent: true
    dbMapping:
    mirrorDb: true
    database: testck
  • 注:我这里是同步testck库的所有表

    启动canal-adapter服务

    bin/startup.sh
  • 在/.../canal-adapter/logs/adapter下查看日志

编码读书
1 声望0 粉丝