Technology Sharing | MySQL and TiDB quickly import full data from each other - 个人文章

Author: Yang Taotao
Senior database expert, specializing in MySQL for more than ten years. Good at backup and recovery related to open source databases such as MySQL, PostgreSQL, MongoDB, SQL tuning, monitoring operation and maintenance, and high-availability architecture design. Currently working at Aikesheng, providing MySQL-related technical support and MySQL-related course training for major operators and banking and financial companies.
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.

MySQL and TIDB are 80% grammatically compatible, and can be mixed in most scenarios. MySQL can be used as the upstream of TIDB, and TIDB can also be used as the upstream of MySQL. Today, I will share how to logically import and export data between the two databases.

Generally speaking, there are two logical import and export formats, one is CSV, TSV and other formats, and the other is the format of SQL statements.

Here I use two tables as an example of import and export, one is table t1, the other is table t1_csv, the records have 200W; the TIDB version is 3.1.2, and the MySQL version is 5.7.31.

In the first part, TIDB exports data upstream, and MySQL imports data downstream.

The TIDB database itself does not support the direct export of table records to CSV files, but TIDB has additional tools to export (version 3.0 mydumper to export SQL files, version 4.0 has dumpling to export SQL and CSV).

Use dumper to export table t1, and the results are SQL files; dumpling export table t1_csv and the results are CSV files; these two export programs are the same as mysqldump and need to be connected to an online database.

The command line of dumper to export the sql file: (each file is about 256M)

[root@ytt-pc data_sql]# mydumper  -u root -h 127.0.0.1 -P 4001 -B ytt -T t1  -F 256 -o /tmp/data_sql/

Exported file list: (similar to the file list exported by the MySQL SHELL UTIL component that I shared before)

[root@ytt-pc data_sql]# ls -sihl
总用量 1.1G
201327040 4.0K -rw-r--r-- 1 root root  146 5月  12 18:21 metadata
201327041 4.0K -rw-r--r-- 1 root root   65 5月  12 18:21 ytt-schema-create.sql
201327077 246M -rw-r--r-- 1 root root 246M 5月  12 18:21 ytt.t1.000000002.sql
201327078 246M -rw-r--r-- 1 root root 246M 5月  12 18:21 ytt.t1.000000003.sql
201327079 245M -rw-r--r-- 1 root root 245M 5月  12 18:21 ytt.t1.000000004.sql
201327080 122M -rw-r--r-- 1 root root 122M 5月  12 18:21 ytt.t1.000000005.sql
201327075 245M -rw-r--r-- 1 root root 245M 5月  12 18:21 ytt.t1.00001.sql
201327076 4.0K -rw-r--r-- 1 root root  327 5月  12 18:21 ytt.t1-schema.sql

dumpling export csv file command line: (similarly, each CSV file is also 256M)

[root@ytt-pc data_csv]# dumpling  -B ytt -T ytt.t1_csv -uroot  -P4001 -h 127.0.0.1 --filetype csv --filesize 256M -o /tmp/data_csv/
Release version: v4.0.8
Git commit hash: b84f64ff362cedcb795aa23fa1188ba7b7c9a7d7
Git branch:      heads/refs/tags/v4.0.8
Build timestamp: 2020-10-30 08:14:27Z
Go version:      go version go1.13 linux/amd64

[2021/05/12 18:22:05.686 +08:00] [INFO] [config.go:180] ["detect server type"] [type=TiDB]
[2021/05/12 18:22:05.686 +08:00] [INFO] [config.go:198] ["detect server version"] [version=3.1.2]
...

List of exported files:

[root@ytt-pc data_csv]# ls -sihl
总用量 1.1G
555999 4.0K -rw-r--r-- 1 root root  146 5月  12 18:22 metadata
127975 4.0K -rw-r--r-- 1 root root   94 5月  12 18:22 ytt-schema-create.sql
132203 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.0.csv
555974 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.1.csv
555996 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.2.csv
555997 257M -rw-r--r-- 1 root root 257M 5月  12 18:22 ytt.t1_csv.3.csv
555998  71M -rw-r--r-- 1 root root  71M 5月  12 18:22 ytt.t1_csv.4.csv
127980 4.0K -rw-r--r-- 1 root root  324 5月  12 18:22 ytt.t1_csv-schema.sql

After exporting, I wrote a simple script to import these two tables to MySQL

#!/bin/sh
usage()
{
        echo ""
        echo "Usage:./source_tidb_to_mysql csv or sql"
        echo ""
}
file_format=$1
file_path_csv=/tmp/data_csv/
file_path_sql=/tmp/data_sql/
if [ "$file_format" = "csv" ];then
    for i in `ls "$file_path_csv"ytt*.csv`
    do 
    {
         load_options="load data infile '$i' into table t1_csv fields terminated by ',' enclosed by '\"' ignore 1 lines"
          mysql -udumper -S /tmp/mysql_sandbox5731.sock -D ytt -e "$load_options"
    }
    done

elif [ "$file_format" = "sql" ];then
    for i in `ls "$file_path_sql"ytt.t1.*.sql`
    do
    {
          mysql -udumper -S /tmp/mysql_sandbox5731.sock -D ytt<$i
    }
    done
else
  usage;
fi

respectively call the script to import tables t1 and t1_csv to MySQL

导入表t1

[root@ytt-pc scripts]# ./source_tidb_to_mysql sql

导入表t1_csv

[root@ytt-pc scripts]# ./source_tidb_to_mysql csv

simply check whether the number of records in the table below is 200W:

mysql [localhost:mysql_sandbox5731.sock] {root} (ytt) > select (select count(*) from t1) 't1_count', (select count(*) from t1_csv) 't1_csv_count';
+----------+--------------+
| t1_count | t1_csv_count |
+----------+--------------+
|  2000000 |      2000000 |
+----------+--------------+
1 row in set (1.86 sec)

In the second part, MySQL exports data upstream, and TIDB imports data downstream.

In order to avoid the need to make additional text corrections after the MySQL free tool is exported, here directly use the export tool dumpling provided by TIDB to export the MySQL tables t1 and t1_csv. Dumpling automatically detects that the data source is MySQL.

[root@ytt-pc data_csv]# dumpling  -B ytt -T ytt.t1_csv -udumper -P5731  -h 127.0.0.1 --filetype csv --filesize 256M -o /tmp/data_csv/
Release version: v4.0.8
Git commit hash: b84f64ff362cedcb795aa23fa1188ba7b7c9a7d7
Git branch:      heads/refs/tags/v4.0.8
Build timestamp: 2020-10-30 08:14:27Z
Go version:      go version go1.13 linux/amd64

[2021/05/12 17:57:24.035 +08:00] [INFO] [config.go:180] ["detect server type"] [type=MySQL]
[2021/05/12 17:57:24.035 +08:00] [INFO] [config.go:198] ["detect server version"] [version=5.7.31]
...

Similarly, use the dumpling tool to export a file in SQL format

[root@ytt-pc data_sql]# dumpling  -B ytt -T ytt.t1 -udumper -P5731  -h 127.0.0.1 --filetype sql --filesize 256M -o /tmp/data_sql/
Release version: v4.0.8
Git commit hash: b84f64ff362cedcb795aa23fa1188ba7b7c9a7d7
Git branch:      heads/refs/tags/v4.0.8
Build timestamp: 2020-10-30 08:14:27Z
Go version:      go version go1.13 linux/amd64

[2021/05/12 18:01:56.984 +08:00] [INFO] [config.go:180] ["detect server type"] [type=MySQL]
[2021/05/12 18:01:56.984 +08:00] [INFO] [config.go:198] ["detect server version"] [version=5.7.31]
...

After the MySQL source data is exported, use the full data import tool tidb-lightning provided by TIDB for quick import. This tool supports CSV data sources or SQL data sources exported by mydumper/dumpling.

The tidb-lightning tool must first run the back-end program tikv-importer to apply the key-value pairs converted by tidb-lightning to the database;

Then start the tidb-lightning program to accept the data source, and convert it into a key-value pair to the background tikv-importer for import.

Start tikv-importer resident process, the default port is 8287
[root@ytt-pc data_sql]# tikv-importer -A 127.0.0.1:8287
Next, start the tikv-lightning task to start importing: (default port 8289)

Import table t1 and table t1_csv separately

   导入表t1
   [root@ytt-pc data_sql]# tidb-lightning   --importer 127.0.0.1:8287  --status-addr 127.0.0.1:8289 --tidb-host 127.0.0.1 --tidb-port 4001 --tidb-status 10081 --tidb-user root -d /tmp/data_sql 
   
   导入表t1_csv
   [root@ytt-pc data_sql]# tidb-lightning   --importer 127.0.0.1:8287  --status-addr 127.0.0.1:8289 --tidb-host 127.0.0.1 --tidb-port 4001 --tidb-status 10081 --tidb-user root -d /tmp/data_csv

The same simple verification:

mysql [127.0.0.1:4001] {root} (ytt) > select (select count(*) from t1) t1_count, (select count(*) from t1_csv) t1_csv_count;
+----------+--------------+
| t1_count | t1_csv_count |
+----------+--------------+
|  2000000 |      2000000 |
+----------+--------------+
1 row in set (1.04 sec)

If the amount of table data is small, you can consider exporting CSV by directly executing select ... into outfile on the MySQL side, and then directly import it on the TIDB side.

For example, directly import the small table t1_small with 1W rows, and export CSV on the MySQL side:

mysql [localhost:mysql_sandbox5731.sock] {root} (ytt) > select * from t1_small into outfile '/tmp/data_csv/t1_small.csv' fields terminated by ',' enclosed by '"';
Query OK, 10000 rows affected (0.03 sec)

Direct SQL command import on TIDB side:

mysql [127.0.0.1:4001] {root} (ytt) > load data local infile '/tmp/data_csv/t1_small.csv' into table t1_small fields terminated by ',' enclosed by '"';
Query OK, 10000 rows affected (1.55 sec)
Records: 10000  Deleted: 0  Skipped: 0  Warnings: 0

Technology Sharing | MySQL and TiDB quickly import full data from each other

In the first part, TIDB exports data upstream, and MySQL imports data downstream.

In the second part, MySQL exports data upstream, and TIDB imports data downstream.

爱可生开源社区

引用和评论

Oracle 迁移到 OB 过程中有哪些不兼容的 DDL 语法？

在Java程序中监听mysql的binlog

Mysql数据库sql优化案例

MySQL 回收表碎片实践教程

高效工程师的七个习惯

浅读TiKV源码：Coprocessor

Mysql梳理1