Best practices for decrypting heterogeneous data of ten-thousand-word text (including complete code implementation)! !

Write in front

In today's Internet industry, especially in the current distributed, microservice development environment, in order to improve search efficiency and search accuracy, a large number of NoSQL databases such as Redis and Memcached will be used, and a large number of full-text search services such as Solr and Elasticsearch will also be used. And search engines. Then, at this time, there will be a problem that we need to think about and solve: that is the problem of data synchronization! How to synchronize the data in the real-time changing database to Redis/Memcached or Solr/Elasticsearch?

Data synchronization requirements in the context of the Internet

In today's Internet industry, especially in the current distributed, microservice development environment, in order to improve search efficiency and search accuracy, a large number of NoSQL databases such as Redis and Memcached will be used, and a large number of full-text search services such as Solr and Elasticsearch will also be used. . Then, at this time, there will be a problem that we need to think about and solve: that is the problem of data synchronization! How to synchronize the data in the real-time changing database to Redis/Memcached or Solr/Elasticsearch?

For example, we are constantly writing data to the database in a distributed environment, and we may need to read data from Redis, Memcached, or Elasticsearch, Solr and other services. Then, the problem of real-time synchronization of data in the database and each service has become a problem that we urgently need to solve.

Imagine that due to business needs, we introduced Redis, Memcached or Elasticsearch, Solr and other services. So that our application may read data from different services, as shown in the figure below.

Essentially, no matter what service or middleware we introduce, the data is ultimately read from our MySQL database. So, the question is, how to synchronize the data in MySQL to other services or middleware in real time?

Note: In order to better explain the problem, the following content takes the data in the MySQL database to be synchronized to the Solr index library as an example.

Data synchronization solution

1. Synchronize in business code

After adding, modifying, and deleting, the logic code for operating the Solr index library is executed. For example, the following code snippet.

public ResponseResult updateStatus(Long[] ids, String status){
    try{
        goodsService.updateStatus(ids, status);
        if("status_success".equals(status)){
            List<TbItem> itemList = goodsService.getItemList(ids, status);
            itemSearchService.importList(itemList);
            return new ResponseResult(true, "修改状态成功")
        }
    }catch(Exception e){
        return new ResponseResult(false, "修改状态失败");
    }
}

Advantages:

Easy to operate.

Disadvantages:

High degree of business coupling.

The execution efficiency becomes low.

2. Timing task synchronization

After adding, modifying, and deleting operations in the database, the data in the database is synchronized to the Solr index library at regular intervals through timed tasks.

Timing task technologies are: SpringTask, Quartz.

Haha, and my open source mykit-delay framework, the open source address is: https://github.com/sunshinelyz/mykit-delay.

Here is a trick to pay attention to when executing a timed task: when executing a timed task for the first time, query the corresponding data in reverse order from the MySQL database in the time field, and record the maximum value of the time field of the current query data. Every time you perform a timed task to query data, you only need to query the data in the data table in the reverse order of the time field for data whose time field is greater than the last recorded time value, and record the maximum value of the time field queried for this task. Need to query all the data in the data table again.

Note: The time field mentioned here refers to the time field that identifies the data update, that is to say, when using a timed task to synchronize data, in order to avoid a full table scan every time the task is executed, it is best to add it to the data table A time field for updating the record.

advantages:

The operation of synchronizing the Solr index library is completely decoupled from the business code.

Disadvantages:

The real-time nature of the data is not high.

3. Synchronize through MQ

After performing addition, modification, and deletion operations in the database, a message is sent to MQ. At this time, the synchronization program, as a consumer in MQ, obtains the message from the message queue, and then executes the logic of synchronizing the Solr index library.

We can use the following figure to simply identify the process of data synchronization through MQ.

We can use the following code to achieve this process.

public ResponseResult updateStatus(Long[] ids, String status){
    try{
        goodsService.updateStatus(ids, status);
        if("status_success".equals(status)){
            List<TbItem> itemList = goodsService.getItemList(ids, status);
            final String jsonString = JSON.toJSONString(itemList);
            jmsTemplate.send(queueSolr, new MessageCreator(){
                @Override
                public Message createMessage(Session session) throws JMSException{
                    return session.createTextMessage(jsonString);
                }
            });
        }
        return new ResponseResult(true, "修改状态成功");
    }catch(Exception e){
        return new ResponseResult(false, "修改状态失败");
    }
}

Advantages:

The business code is decoupled and can be quasi-real-time.

Disadvantages:

It is necessary to add the code for sending messages to MQ in the business code, and the data call interface is coupled.

4. Real-time synchronization through Canal

Canal is a database log incremental analysis component open sourced by Alibaba. It analyzes the log information of the database through Canal to detect changes in the table structure and data in the database, thereby updating the Solr index library.

With Canal, business code can be completely decoupled, API completely decoupled, and quasi real-time can be achieved.

Introduction to Canal

Alibaba MySQL database binlog incremental subscription and consumption component, based on database incremental log analysis, provides incremental data subscription and consumption, currently mainly supports MySQL.

Canal open source address: https://github.com/alibaba/canal.

How Canal Works

The realization of MySQL master-slave replication

As can be seen from the figure above, master-slave replication is mainly divided into three steps:

The master node records data changes to the binary log (these records are called binary log events, and binary log events can be viewed through show binlog events).
The slave node copies the binary log events of the master node to its relay log.
Slave nodes redo events in the relay log and reflect the changes to their own database.

Canal internal principle

First, let's look at the schematic diagram of Canal, as shown below.

The principle is roughly described as follows:

Canal simulates the interactive protocol of MySQL slave, pretends to be MySQL Slave, and sends dump protocol to MySQL Master
MySQL Master receives the dump request and starts to push the binary log to the Slave (ie Canal)
Canal parses binary log objects (originally byte stream)

Canal internal structure

described as follows:

Server: Represents a Canal running instance, corresponding to a JVM process.
Instance: corresponds to a data queue (1 Server corresponds to one or more Instances).

Next, let's look at the sub-modules under Instance, as shown below.

EventParser: Data source access, simulation of Slave protocol and Master node interaction, protocol analysis.
EventSink: The connector of EventParser and EventStore, which processes data such as filtering, processing, merging and distributing.
EventSore: Data storage.
MetaManager: Incremental subscription and consumption information management.

Canal environment preparation

Set up MySQL remote access

grant all privileges on *.* to 'root'@'%' identified by '123456';
flush privileges;

MySQL configuration

Note: The MySQL description here is based on version 5.7.

The principle of Canal is based on MySQL binlog technology. Therefore, if you want to use Canal, you must enable MySQL's binlog writing function. It is recommended to configure the binlog mode as row.

You can enter the following command on the MySQL command line to view the binlog mode.

SHOW VARIABLES LIKE 'binlog_format';

The execution effect is as follows.

As you can see, the default binlog format in MySQL is STATEMENT, here we need to modify STATEMENT to ROW. Modify the /etc/my.cnf file.

vim /etc/my.cnf

Add the following three configurations under [mysqld].

log-bin=mysql-bin  #开启MySQL二进制日志
binlog_format=ROW #将二进制日志的格式设置为ROW
server_id=1 #server_id需要唯一，不能与Canal的slaveId重复

After modifying the my.cnf file, you need to restart the MySQL service.

service mysqld restart

Next, we look at the binlog mode again.

SHOW VARIABLES LIKE 'binlog_format';

As you can see, at this time, MySQL's binlog mode has been set to ROW.

MySQL create user authorization

The principle of Canal is that the model itself is MySQL Slave, so the relevant permissions of MySQL Slave must be set. Here, you need to create a master-slave synchronization account, and give this account related permissions.

CREATE USER canal@'localhost' IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'localhost';
FLUSH PRIVILEGES;

Canal deployment and installation

Download Canal

Here, we use Canal 1.1.1 version to explain, friends can go to the link https://github.com/alibaba/canal/releases/tag/canal-1.1.1 download Canal 1.1.1 version.

Upload and unzip

Upload the downloaded Canal installation package to the server, and execute the following command to decompress

mkdir -p /usr/local/canal
tar -zxvf canal.deployer-1.1.1.tar.gz -C /usr/local/canal/

The decompressed directory is shown below.

The description of each directory is as follows:

bin: Store executable scripts.
conf: store configuration files.
lib: Store other dependencies or third-party libraries.
logs: Stores log files.

Modify the configuration file

There is a canal.properties file in Canal's conf directory. This file is configured with Canal Server related configuration. In this file, there is the following line of configuration.

canal.destinations=example

The example here is equivalent to an Instance of Canal. Multiple Instances can be configured here, and multiple Instances can be separated by commas. At the same time, the example here also corresponds to a folder in Canal's conf directory. In other words, each Instance instance in Canal corresponds to a subdirectory under the conf directory.

Next, we need to modify a configuration file instance.properties in the example directory under Canal's conf directory.

vim instance.properties

Modify the following configuration items.

#################################################################
## canal slaveId,注意：不要与MySQL的server_id重复
canal.instance.mysql.slaveId = 1234

#position info，需要改成自己的数据库信息
canal.instance.master.address = 127.0.0.1:3306
canal.instance.master.journal.name =
canal.instance.master.position =
canal.instance.master.timestamp =

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =

#username/password，需要改成自己的数据库信息
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal
canal.instance.defaultDatabaseName =canaldb
canal.instance.connectionCharset = UTF-8

#table regex
canal.instance.filter.regex = canaldb\\..*
#################################################################

Option meaning:

canal.instance.mysql.slaveId: The concept of serverId in the mysql cluster configuration needs to be unique to the id in the current mysql cluster;
canal.instance.master.address: mysql main library link address;
canal.instance.dbUsername: mysql database account;
canal.instance.dbPassword: mysql database password;
canal.instance.defaultDatabaseName: the default database when mysql is connected;
canal.instance.connectionCharset: mysql data parsing code;
canal.instance.filter.regex: mysql data analysis table, Perl regular expression.

Start Canal

After configuring Canal, you can start Canal. Enter the bin directory of Canal and enter the following command to start Canal.

./startup.sh

Test Canal

Import and modify the source code

Here, we use Canal's source code for testing. After downloading Canal's source code, we import it into IDEA.

Next, we find the SimpleCanalClientTest class under example for testing. The source code of this class is shown below.

package com.alibaba.otter.canal.example;

import java.net.InetSocketAddress;

import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.common.utils.AddressUtils;

/**
 * 单机模式的测试例子
 * 
 * @author jianghang 2013-4-15 下午04:19:20
 * @version 1.0.4
 */
public class SimpleCanalClientTest extends AbstractCanalClientTest {

    public SimpleCanalClientTest(String destination){
           super(destination);
     }

    public static void main(String args[]) {
        // 根据ip，直接创建链接，无HA的功能
        String destination = "example";
        String ip = AddressUtils.getHostIp();
        CanalConnector connector = CanalConnectors.newSingleConnector(
            new InetSocketAddress(ip, 11111),
                destination,
                "canal",
                "canal");

        final SimpleCanalClientTest clientTest = new SimpleCanalClientTest(destination);
        clientTest.setConnector(connector);
        clientTest.start();
        Runtime.getRuntime().addShutdownHook(new Thread() {

            public void run() {
                try {
                    logger.info("## stop the canal client");
                    clientTest.stop();
                } catch (Throwable e) {
                    logger.warn("##something goes wrong when stopping canal:", e);
                } finally {
                    logger.info("## canal client is down.");
                }
            }

        });
    }
}

As you can see, in this class, the destination used is example. In this class, we only need to modify the IP address to the IP of the Canal Server.

Specifically: Will be the following line of code.

String ip = AddressUtils.getHostIp();

change into:

String ip = "192.168.175.100"

Since we did not specify the user name and password when configuring Canal, we also need to add the following code.

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress(ip, 11111),
    destination,
    "canal",
    "canal");

change into:

CanalConnector connector = CanalConnectors.newSingleConnector(
    new InetSocketAddress(ip, 11111),
    destination,
    "",
    "");

After the modification is completed, run the main method to start the program.

Test data changes

Next, create a canaldb database in MySQL.

create database canaldb;

At this time, the relevant log information will be output on the IDEA command line.

****************************************************
* Batch Id: [7] ,count : [3] , memsize : [149] , Time : 2020-08-05 23:25:35
* Start : [mysql-bin.000007:6180:1540286735000(2020-08-05 23:25:35)] 
* End : [mysql-bin.000007:6356:1540286735000(2020-08-05 23:25:35)] 
****************************************************

Next, I create a data table in the canaldb database, and add, delete, modify and check the data in the data table. The log information output by the program is as follows.

#在mysql进行数据变更后，这里会显示mysql的bin日志。
****************************************************
* Batch Id: [7] ,count : [3] , memsize : [149] , Time : 2020-08-05 23:25:35
* Start : [mysql-bin.000007:6180:1540286735000(2020-08-05 23:25:35)] 
* End : [mysql-bin.000007:6356:1540286735000(2020-08-05 23:25:35)] 
****************************************************

================> binlog[mysql-bin.000007:6180] , executeTime : 1540286735000(2020-08-05 23:25:35) , gtid : () , delay : 393ms
 BEGIN ----> Thread id: 43
----------------> binlog[mysql-bin.000007:6311] , name[canal,canal_table] , eventType : DELETE , executeTime : 1540286735000(2020-08-05 23:25:35) , gtid : () , delay : 393 ms
id : 8    type=int(10) unsigned
name : 512    type=varchar(255)
----------------
 END ----> transaction id: 249
================> binlog[mysql-bin.000007:6356] , executeTime : 1540286735000(2020-08-05 23:25:35) , gtid : () , delay : 394ms

****************************************************
* Batch Id: [8] ,count : [3] , memsize : [149] , Time : 2020-08-05 23:25:35
* Start : [mysql-bin.000007:6387:1540286869000(2020-08-05 23:25:49)] 
* End : [mysql-bin.000007:6563:1540286869000(2020-08-05 23:25:49)] 
****************************************************

================> binlog[mysql-bin.000007:6387] , executeTime : 1540286869000(2020-08-05 23:25:49) , gtid : () , delay : 976ms
 BEGIN ----> Thread id: 43
----------------> binlog[mysql-bin.000007:6518] , name[canal,canal_table] , eventType : INSERT , executeTime : 1540286869000(2020-08-05 23:25:49) , gtid : () , delay : 976 ms
id : 21    type=int(10) unsigned    update=true
name : aaa    type=varchar(255)    update=true
----------------
 END ----> transaction id: 250
================> binlog[mysql-bin.000007:6563] , executeTime : 1540286869000(2020-08-05 23:25:49) , gtid : () , delay : 977ms

****************************************************
* Batch Id: [9] ,count : [3] , memsize : [161] , Time : 2020-08-05 23:26:22
* Start : [mysql-bin.000007:6594:1540286902000(2020-08-05 23:26:22)] 
* End : [mysql-bin.000007:6782:1540286902000(2020-08-05 23:26:22)] 
****************************************************

================> binlog[mysql-bin.000007:6594] , executeTime : 1540286902000(2020-08-05 23:26:22) , gtid : () , delay : 712ms
 BEGIN ----> Thread id: 43
----------------> binlog[mysql-bin.000007:6725] , name[canal,canal_table] , eventType : UPDATE , executeTime : 1540286902000(2020-08-05 23:26:22) , gtid : () , delay : 712 ms
id : 21    type=int(10) unsigned
name : aaac    type=varchar(255)    update=true
----------------
 END ----> transaction id: 252
================> binlog[mysql-bin.000007:6782] , executeTime : 1540286902000(2020-08-05 23:26:22) , gtid : () , delay : 713ms

Data synchronization realization

demand

The changes of database data are parsed through canal and binlog is updated to Solr's index database in real time.

Implementation

Create project

Create the Maven project mykit-canal-demo, and add the following configuration in the pom.xml file.

<dependencies>
    <dependency>
        <groupId>com.alibaba.otter</groupId>
        <artifactId>canal.client</artifactId>
        <version>1.0.24</version>
    </dependency>
    <dependency>
        <groupId>com.alibaba.otter</groupId>
        <artifactId>canal.protocol</artifactId>
        <version>1.0.24</version>
    </dependency>
    <dependency>
        <groupId>commons-lang</groupId>
        <artifactId>commons-lang</artifactId>
        <version>2.6</version>
    </dependency>
    <dependency>
        <groupId>org.codehaus.jackson</groupId>
        <artifactId>jackson-mapper-asl</artifactId>
        <version>1.8.9</version>
    </dependency>

    <dependency>
        <groupId>org.apache.solr</groupId>
        <artifactId>solr-solrj</artifactId>
        <version>4.10.3</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.9</version>
        <scope>test</scope>
    </dependency>

</dependencies>

create log4j configuration file xml

Create a log4j.properties file in the src/main/resources directory of the project, and the content is as follows.

log4j.rootCategory=debug, CONSOLE

# CONSOLE is set to be a ConsoleAppender using a PatternLayout.
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} %-6r [%15.15t] %-5p %30.30c %x - %m\n

# LOGFILE is set to be a File appender using a PatternLayout.
# log4j.appender.LOGFILE=org.apache.log4j.FileAppender
# log4j.appender.LOGFILE.File=d:\axis.log
# log4j.appender.LOGFILE.Append=true
# log4j.appender.LOGFILE.layout=org.apache.log4j.PatternLayout
# log4j.appender.LOGFILE.layout.ConversionPattern=%d{ISO8601} %-6r [%15.15t] %-5p %30.30c %x - %m\n

Create entity class

Create a Book entity class under the io.mykit.canal.demo.bean package to test the data transmission of Canal, as shown below.

package io.mykit.canal.demo.bean;
import org.apache.solr.client.solrj.beans.Field;
import java.util.Date;
public class Book implements Serializable {
    private static final long serialVersionUID = -6350345408771427834L;{

    @Field("id")
    private Integer id;

    @Field("book_name")
    private String name;

    @Field("book_author")
    private String author;

    @Field("book_publishtime")
    private Date publishtime;

    @Field("book_price")
    private Double price;

    @Field("book_publishgroup")
    private String publishgroup;

    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getAuthor() {
        return author;
    }

    public void setAuthor(String author) {
        this.author = author;
    }

    public Date getPublishtime() {
        return publishtime;
    }

    public void setPublishtime(Date publishtime) {
        this.publishtime = publishtime;
    }

    public Double getPrice() {
        return price;
    }

    public void setPrice(Double price) {
        this.price = price;
    }

    public String getPublishgroup() {
        return publishgroup;
    }

    public void setPublishgroup(String publishgroup) {
        this.publishgroup = publishgroup;
    }

    @Override
    public String toString() {
        return "Book{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", author='" + author + '\'' +
                ", publishtime=" + publishtime +
                ", price=" + price +
                ", publishgroup='" + publishgroup + '\'' +
                '}';
    }
}

Among them, in the Book entity class, we use Solr's annotation @Field to define the relationship between the entity class field and the Solr domain.

various tools

Next, we will create various tool classes under the io.mykit.canal.demo.utils package.

BinlogValue

Used to store the value of each row and column of binlog analysis, the code is shown below.

package io.mykit.canal.demo.utils;
import java.io.Serializable;
/**
 * 
 * ClassName: BinlogValue <br/> 
 * 
 * binlog分析的每行每列的value值；<br>
 * 新增数据：beforeValue 和 value 均为现有值；<br>
 * 修改数据：beforeValue是修改前的值；value为修改后的值；<br>
 * 删除数据：beforeValue和value均是删除前的值； 这个比较特殊主要是为了删除数据时方便获取删除前的值<br>
 */
public class BinlogValue implements Serializable {

    private static final long serialVersionUID = -6350345408773943086L;
    
    private String value;
    private String beforeValue;
    
    /**
     * binlog分析的每行每列的value值；<br>
     * 新增数据： value：为现有值；<br>
     * 修改数据：value为修改后的值；<br>
     * 删除数据：value是删除前的值； 这个比较特殊主要是为了删除数据时方便获取删除前的值<br>
     */
    public String getValue() {
        return value;
    }
    public void setValue(String value) {
        this.value = value;
    }
    
    /**
     * binlog分析的每行每列的beforeValue值；<br>
     * 新增数据：beforeValue为现有值；<br>
     * 修改数据：beforeValue是修改前的值；<br>
     * 删除数据：beforeValue为删除前的值； <br>
     */
    public String getBeforeValue() {
        return beforeValue;
    }
    public void setBeforeValue(String beforeValue) {
        this.beforeValue = beforeValue;
    }
}

CanalDataParser

Used to parse the data, the code is shown below.

package io.mykit.canal.demo.utils;

import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.lang.SystemUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.CollectionUtils;

import com.alibaba.otter.canal.protocol.Message;
import com.alibaba.otter.canal.protocol.CanalEntry.Column;
import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
import com.alibaba.otter.canal.protocol.CanalEntry.EntryType;
import com.alibaba.otter.canal.protocol.CanalEntry.EventType;
import com.alibaba.otter.canal.protocol.CanalEntry.RowChange;
import com.alibaba.otter.canal.protocol.CanalEntry.RowData;
import com.alibaba.otter.canal.protocol.CanalEntry.TransactionBegin;
import com.alibaba.otter.canal.protocol.CanalEntry.TransactionEnd;
import com.google.protobuf.InvalidProtocolBufferException;

/**
 * 解析数据
 */
public class CanalDataParser {
    
    protected static final String DATE_FORMAT   = "yyyy-MM-dd HH:mm:ss";
    protected static final String yyyyMMddHHmmss = "yyyyMMddHHmmss";
    protected static final String yyyyMMdd      = "yyyyMMdd";
    protected static final String SEP           = SystemUtils.LINE_SEPARATOR;
    protected static String  context_format     = null;
    protected static String  row_format         = null;
    protected static String  transaction_format = null;
    protected static String row_log = null;
    
    private static Logger logger = LoggerFactory.getLogger(CanalDataParser.class);
    
    static {
        context_format = SEP + "****************************************************" + SEP;
        context_format += "* Batch Id: [{}] ,count : [{}] , memsize : [{}] , Time : {}" + SEP;
        context_format += "* Start : [{}] " + SEP;
        context_format += "* End : [{}] " + SEP;
        context_format += "****************************************************" + SEP;

        row_format = SEP
                     + "----------------> binlog[{}:{}] , name[{},{}] , eventType : {} , executeTime : {} , delay : {}ms"
                     + SEP;

        transaction_format = SEP + "================> binlog[{}:{}] , executeTime : {} , delay : {}ms" + SEP;

        row_log = "schema[{}], table[{}]";
    }

    public static List<InnerBinlogEntry> convertToInnerBinlogEntry(Message message) {
        List<InnerBinlogEntry> innerBinlogEntryList = new ArrayList<InnerBinlogEntry>();
        
        if(message == null) {
            logger.info("接收到空的 message; 忽略");
            return innerBinlogEntryList;
        }
        
        long batchId = message.getId();
        int size = message.getEntries().size();
        if (batchId == -1 || size == 0) {
            logger.info("接收到空的message[size=" + size + "]; 忽略");
            return innerBinlogEntryList;
        }

        printLog(message, batchId, size);
        List<Entry> entrys = message.getEntries();

        //输出日志
        for (Entry entry : entrys) {
            long executeTime = entry.getHeader().getExecuteTime();
            long delayTime = new Date().getTime() - executeTime;
            
            if (entry.getEntryType() == EntryType.TRANSACTIONBEGIN || entry.getEntryType() == EntryType.TRANSACTIONEND) {
                if (entry.getEntryType() == EntryType.TRANSACTIONBEGIN) {
                    TransactionBegin begin = null;
                    try {
                        begin = TransactionBegin.parseFrom(entry.getStoreValue());
                    } catch (InvalidProtocolBufferException e) {
                        throw new RuntimeException("parse event has an error , data:" + entry.toString(), e);
                    }
                    // 打印事务头信息，执行的线程id，事务耗时
                    logger.info("BEGIN ----> Thread id: {}",  begin.getThreadId());
                    logger.info(transaction_format, new Object[] {entry.getHeader().getLogfileName(),
                                String.valueOf(entry.getHeader().getLogfileOffset()), String.valueOf(entry.getHeader().getExecuteTime()), String.valueOf(delayTime) });

                } else if (entry.getEntryType() == EntryType.TRANSACTIONEND) {
                    TransactionEnd end = null;
                    try {
                        end = TransactionEnd.parseFrom(entry.getStoreValue());
                    } catch (InvalidProtocolBufferException e) {
                        throw new RuntimeException("parse event has an error , data:" + entry.toString(), e);
                    }
                    // 打印事务提交信息，事务id
                    logger.info("END ----> transaction id: {}", end.getTransactionId());
                    logger.info(transaction_format,
                        new Object[] {entry.getHeader().getLogfileName(),  String.valueOf(entry.getHeader().getLogfileOffset()),
                                String.valueOf(entry.getHeader().getExecuteTime()), String.valueOf(delayTime) });
                }
                continue;
            }

            //解析结果
            if (entry.getEntryType() == EntryType.ROWDATA) {
                RowChange rowChage = null;
                try {
                    rowChage = RowChange.parseFrom(entry.getStoreValue());
                } catch (Exception e) {
                    throw new RuntimeException("parse event has an error , data:" + entry.toString(), e);
                }

                EventType eventType = rowChage.getEventType();

                logger.info(row_format, new Object[] { entry.getHeader().getLogfileName(),
                            String.valueOf(entry.getHeader().getLogfileOffset()), entry.getHeader().getSchemaName(),
                            entry.getHeader().getTableName(), eventType, String.valueOf(entry.getHeader().getExecuteTime()), String.valueOf(delayTime) });

                //组装数据结果
                if (eventType == EventType.INSERT || eventType == EventType.DELETE || eventType == EventType.UPDATE) {
                    String schemaName = entry.getHeader().getSchemaName();
                    String tableName = entry.getHeader().getTableName();
                    List<Map<String, BinlogValue>> rows = parseEntry(entry);

                    InnerBinlogEntry innerBinlogEntry = new InnerBinlogEntry();
                    innerBinlogEntry.setEntry(entry);
                    innerBinlogEntry.setEventType(eventType);
                    innerBinlogEntry.setSchemaName(schemaName);
                    innerBinlogEntry.setTableName(tableName.toLowerCase());
                    innerBinlogEntry.setRows(rows);

                    innerBinlogEntryList.add(innerBinlogEntry);
                } else {
                    logger.info(" 存在 INSERT INSERT UPDATE 操作之外的SQL [" + eventType.toString() + "]");
                }
                continue;
            }
        }
        return innerBinlogEntryList;
    }

    private static List<Map<String, BinlogValue>> parseEntry(Entry entry) {
        List<Map<String, BinlogValue>> rows = new ArrayList<Map<String, BinlogValue>>();
        try {
            String schemaName = entry.getHeader().getSchemaName();
            String tableName = entry.getHeader().getTableName();
            RowChange rowChage = RowChange.parseFrom(entry.getStoreValue());
            EventType eventType = rowChage.getEventType();

            // 处理每个Entry中的每行数据
            for (RowData rowData : rowChage.getRowDatasList()) {
                StringBuilder rowlog = new StringBuilder("rowlog schema[" + schemaName + "], table[" + tableName + "], event[" + eventType.toString() + "]");
                
                Map<String, BinlogValue> row = new HashMap<String, BinlogValue>();
                List<Column> beforeColumns = rowData.getBeforeColumnsList();
                List<Column> afterColumns = rowData.getAfterColumnsList();
                beforeColumns = rowData.getBeforeColumnsList();
                if (eventType == EventType.DELETE) {//delete
                    for(Column column : beforeColumns) {
                        BinlogValue binlogValue = new BinlogValue();
                        binlogValue.setValue(column.getValue());
                        binlogValue.setBeforeValue(column.getValue());
                        row.put(column.getName(), binlogValue);
                    }
                } else if(eventType == EventType.UPDATE) {//update
                    for(Column column : beforeColumns) {
                        BinlogValue binlogValue = new BinlogValue();
                        binlogValue.setBeforeValue(column.getValue());
                        row.put(column.getName(), binlogValue);
                    }
                    for(Column column : afterColumns) {
                        BinlogValue binlogValue = row.get(column.getName());
                        if(binlogValue == null) {
                            binlogValue = new BinlogValue();
                        }
                        binlogValue.setValue(column.getValue());
                        row.put(column.getName(), binlogValue);
                    }
                } else { // insert
                    for(Column column : afterColumns) {
                        BinlogValue binlogValue = new BinlogValue();
                        binlogValue.setValue(column.getValue());
                        binlogValue.setBeforeValue(column.getValue());
                        row.put(column.getName(), binlogValue);
                    }
                } 
               
                rows.add(row);
                String rowjson = JacksonUtil.obj2str(row);
                
                logger.info("#################################### Data Parse Result ####################################");
                logger.info(rowlog + " , " + rowjson);
                logger.info("#################################### Data Parse Result ####################################");
                logger.info("");
            }
        } catch (InvalidProtocolBufferException e) {
            throw new RuntimeException("parseEntry has an error , data:" + entry.toString(), e);
        }
        return rows;
    }

    private static void printLog(Message message, long batchId, int size) {
        long memsize = 0;
        for (Entry entry : message.getEntries()) {
            memsize += entry.getHeader().getEventLength();
        }

        String startPosition = null;
        String endPosition = null;
        if (!CollectionUtils.isEmpty(message.getEntries())) {
            startPosition = buildPositionForDump(message.getEntries().get(0));
            endPosition = buildPositionForDump(message.getEntries().get(message.getEntries().size() - 1));
        }

        SimpleDateFormat format = new SimpleDateFormat(DATE_FORMAT);
        logger.info(context_format, new Object[] {batchId, size, memsize, format.format(new Date()), startPosition, endPosition });
    }

    private static String buildPositionForDump(Entry entry) {
        long time = entry.getHeader().getExecuteTime();
        Date date = new Date(time);
        SimpleDateFormat format = new SimpleDateFormat(DATE_FORMAT);
        return entry.getHeader().getLogfileName() + ":" + entry.getHeader().getLogfileOffset() + ":" + entry.getHeader().getExecuteTime() + "(" + format.format(date) + ")";
    }
}

DateUtils

Time tool class, the code is shown below.

package io.mykit.canal.demo.utils;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

public class DateUtils {
    
    private static final String FORMAT_PATTERN = "yyyy-MM-dd HH:mm:ss";
    
    private static SimpleDateFormat sdf = new SimpleDateFormat(FORMAT_PATTERN);
    
    public static Date parseDate(String datetime) throws ParseException{
        if(datetime != null && !"".equals(datetime)){
            return sdf.parse(datetime);
        }
        return null;
    }
    
    
    public static String formatDate(Date datetime) throws ParseException{
        if(datetime != null ){
            return sdf.format(datetime);
        }
        return null;
    }
    
    public static Long formatStringDateToLong(String datetime) throws ParseException{
        if(datetime != null && !"".equals(datetime)){
            Date d =  sdf.parse(datetime);
            return d.getTime();
        }
        return null;
    }
    
    public static Long formatDateToLong(Date datetime) throws ParseException{
        if(datetime != null){
            return datetime.getTime();
        }
        return null;
    }
}

InnerBinlogEntry

Binlog entity class, the code is shown below.

package io.mykit.canal.demo.utils;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;

import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
import com.alibaba.otter.canal.protocol.CanalEntry.EventType;

public class InnerBinlogEntry {
    
    /**
     * canal原生的Entry
     */
    private Entry entry;
    
    /**
     * 该Entry归属于的表名
     */
    private String tableName;
    
    /**
     * 该Entry归属数据库名
     */
    private String schemaName;
    
    /**
     * 该Entry本次的操作类型，对应canal原生的枚举；EventType.INSERT; EventType.UPDATE; EventType.DELETE;
     */
    private EventType eventType;
    
    private List<Map<String, BinlogValue>> rows = new ArrayList<Map<String, BinlogValue>>();
    
    
    public Entry getEntry() {
        return entry;
    }
    public void setEntry(Entry entry) {
        this.entry = entry;
    }
    public String getTableName() {
        return tableName;
    }
    public void setTableName(String tableName) {
        this.tableName = tableName;
    }
    public EventType getEventType() {
        return eventType;
    }
    public void setEventType(EventType eventType) {
        this.eventType = eventType;
    }
    public String getSchemaName() {
        return schemaName;
    }
    public void setSchemaName(String schemaName) {
        this.schemaName = schemaName;
    }
    public List<Map<String, BinlogValue>> getRows() {
        return rows;
    }
    public void setRows(List<Map<String, BinlogValue>> rows) {
        this.rows = rows;
    }
}

JacksonUtil

Json tool class, the code is shown below.

package io.mykit.canal.demo.utils;

import java.io.IOException;
import org.codehaus.jackson.JsonGenerationException;
import org.codehaus.jackson.JsonParseException;
import org.codehaus.jackson.map.JsonMappingException;
import org.codehaus.jackson.map.ObjectMapper;


public class JacksonUtil {
    private static ObjectMapper mapper = new ObjectMapper();

    public static String obj2str(Object obj) {
        String json = null;
        try {
            json = mapper.writeValueAsString(obj);
        } catch (JsonGenerationException e) {
            e.printStackTrace();
        } catch (JsonMappingException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return json;
    }

    public static <T> T str2obj(String content, Class<T> valueType) {
        try {
            return mapper.readValue(content, valueType);
        } catch (JsonParseException e) {
            e.printStackTrace();
        } catch (JsonMappingException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }
}

synchronization program

After preparing the entity classes and tool classes, we can write a synchronization program to realize real-time synchronization of the data in the MySQL database to the Solr index library. We commonly see the MykitCanalDemoSync class in the io.mykit.canal.demo.main package. The code is as follows Show.

package io.mykit.canal.demo.main;

import io.mykit.canal.demo.bean.Book;
import io.mykit.canal.demo.utils.BinlogValue;
import io.mykit.canal.demo.utils.CanalDataParser;
import io.mykit.canal.demo.utils.DateUtils;
import io.mykit.canal.demo.utils.InnerBinlogEntry;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.net.InetSocketAddress;
import java.text.ParseException;
import java.util.List;
import java.util.Map;

public class SyncDataBootStart {

    private static Logger logger = LoggerFactory.getLogger(SyncDataBootStart.class);

    public static void main(String[] args) throws Exception {

        String hostname = "192.168.175.100";
        Integer port = 11111;
        String destination = "example";

        //获取CanalServer 连接
        CanalConnector canalConnector = CanalConnectors.newSingleConnector(new InetSocketAddress(hostname, port), destination, "", "");

        //连接CanalServer
        canalConnector.connect();

        //订阅Destination
        canalConnector.subscribe();

        //轮询拉取数据
        Integer batchSize = 5*1024;
        while (true){
            Message message = canalConnector.getWithoutAck(batchSize);

            long messageId = message.getId();
            int size = message.getEntries().size();

            if(messageId == -1 || size == 0){
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }else{
                //进行数据同步
                //1. 解析Message对象
                List<InnerBinlogEntry> innerBinlogEntries = CanalDataParser.convertToInnerBinlogEntry(message);

                //2. 将解析后的数据信息 同步到Solr的索引库中.
                syncDataToSolr(innerBinlogEntries);
            }

            //提交确认
            canalConnector.ack(messageId);

        }

    }
    private static void syncDataToSolr(List<InnerBinlogEntry> innerBinlogEntries) throws Exception {
        //获取solr的连接
        SolrServer solrServer = new HttpSolrServer("http://192.168.175.101:8080/solr");

        //遍历数据集合 , 根据数据集合中的数据信息, 来决定执行增加, 修改 , 删除操作 .
        if(innerBinlogEntries != null){
            for (InnerBinlogEntry innerBinlogEntry : innerBinlogEntries) {

                CanalEntry.EventType eventType = innerBinlogEntry.getEventType();

                //如果是Insert, update , 则需要同步数据到 solr 索引库
                if(eventType == CanalEntry.EventType.INSERT || eventType == CanalEntry.EventType.UPDATE){
                    List<Map<String, BinlogValue>> rows = innerBinlogEntry.getRows();
                    if(rows != null){
                        for (Map<String, BinlogValue> row : rows) {
                            BinlogValue id = row.get("id");
                            BinlogValue name = row.get("name");
                            BinlogValue author = row.get("author");
                            BinlogValue publishtime = row.get("publishtime");
                            BinlogValue price = row.get("price");
                            BinlogValue publishgroup = row.get("publishgroup");

                            Book book = new Book();
                            book.setId(Integer.parseInt(id.getValue()));
                            book.setName(name.getValue());
                            book.setAuthor(author.getValue());
                            book.setPrice(Double.parseDouble(price.getValue()));
                            book.setPublishgroup(publishgroup.getValue());
                            book.setPublishtime(DateUtils.parseDate(publishtime.getValue()));


                            //导入数据到solr索引库
                            solrServer.addBean(book);
                            solrServer.commit();
                        }
                    }

                }else if(eventType == CanalEntry.EventType.DELETE){
                    //如果是Delete操作, 则需要删除solr索引库中的数据 .
                    List<Map<String, BinlogValue>> rows = innerBinlogEntry.getRows();
                    if(rows != null){
                        for (Map<String, BinlogValue> row : rows) {
                            BinlogValue id = row.get("id");

                            //根据ID删除solr的索引库
                            solrServer.deleteById(id.getValue());
                            solrServer.commit();
                        }
                    }

                }
            }
        }
    }
}

Next, start the main method of the SyncDataBootStart class to monitor Canal Server, and Canal Server monitors the changes in the MySQL binlog. Once the MySQL binlog changes, SyncDataBootStart will immediately receive the change information and parse the change information into Book objects in real time Update to Solr library. If data is deleted in the MySQL database, the data in the Solr database will also be deleted in real time.

Part of the official Canal document: https://github.com/alibaba/canal .

Alright, come here today, I am Glacier, see you in the next issue~~

Best practices for decrypting heterogeneous data of ten-thousand-word text (including complete code implementation)! !

Write in front

Data synchronization requirements in the context of the Internet

Data synchronization solution

1. Synchronize in business code

2. Timing task synchronization

3. Synchronize through MQ

4. Real-time synchronization through Canal

Introduction to Canal

How Canal Works

The realization of MySQL master-slave replication

Canal internal principle

Canal internal structure

Canal environment preparation

Set up MySQL remote access

MySQL configuration

MySQL create user authorization

Canal deployment and installation

Download Canal

Upload and unzip

Modify the configuration file

Start Canal

Test Canal

Import and modify the source code

Test data changes

Data synchronization realization

demand

Implementation

冰河

引用和评论

互联网大厂的缓存策略：抵抗超高并发的秘密武器，已开源！

ClkLog埋点系统基于ClickHouse的百万日活测试报告

2027倒计时：5个关键数据揭秘100%国产替代实施路径

用C#在Excel工作表中创建数据透视表和数据透视图

【第一章透传策略】手把手教你玩转新版正点原子云

湖仓一体化（Lakehouse）指什么？有哪些应用场景？

时序数据库 TDengine 到 MySQL 数据迁移同步

Best practices for decrypting heterogeneous data of ten-thousand-word text (including complete code implementation)! !

Write in front

Data synchronization requirements in the context of the Internet

Data synchronization solution

1. Synchronize in business code

2. Timing task synchronization

3. Synchronize through MQ

4. Real-time synchronization through Canal

Introduction to Canal

How Canal Works

The realization of MySQL master-slave replication

Canal internal principle

Canal internal structure

Canal environment preparation

Set up MySQL remote access

MySQL configuration

MySQL create user authorization

Canal deployment and installation

Download Canal

Upload and unzip

Modify the configuration file

Start Canal

Test Canal

Import and modify the source code

Test data changes

Data synchronization realization

demand

Implementation

冰河

引用和评论

互联网大厂的缓存策略：抵抗超高并发的秘密武器，已开源！

ClkLog埋点系统基于ClickHouse的百万日活测试报告

2027倒计时：5个关键数据揭秘100%国产替代实施路径

用C#在Excel工作表中创建数据透视表和数据透视图

【第一章 透传策略】手把手教你玩转新版正点原子云

湖仓一体化（Lakehouse）指什么？有哪些应用场景？

时序数据库 TDengine 到 MySQL 数据迁移同步

【第一章透传策略】手把手教你玩转新版正点原子云