1、系统要求

Linux
JDK(1.8以上,推荐1.8)
Python(2或3都可以)
Apache Maven 3.x (Compile DataX)

2、源码编译

1、下载代码,github代码同步到码云上了
git clone https://gitee.com/qzw2015/DataX.git

2、切换tag分支最新release tag
git checkout  datax_v202309

3、修改 DataX/hdfsreader/pom.xml
<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-format</artifactId>
    <version>2.4.0</version>
</dependency>

4、https://github.com/alibaba/DataX/tree/master/oceanbasev10writer/src/main/libs 下面下载 shade-ob-partition-calculator-1.0-SNAPSHOT.jar包并放入到DataX/oceanbasev10writer/src/main/libs下

5、打包
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

结果如下 :

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] datax-all 0.0.1-SNAPSHOT ........................... SUCCESS [01:51 min]
[INFO] datax-common 0.0.1-SNAPSHOT ........................ SUCCESS [  0.772 s]
[INFO] datax-transformer 0.0.1-SNAPSHOT ................... SUCCESS [  0.579 s]
[INFO] datax-core 0.0.1-SNAPSHOT .......................... SUCCESS [  1.407 s]
[INFO] plugin-rdbms-util 0.0.1-SNAPSHOT ................... SUCCESS [  0.452 s]
[INFO] mysqlreader 0.0.1-SNAPSHOT ......................... SUCCESS [  0.568 s]
[INFO] drdsreader 0.0.1-SNAPSHOT .......................... SUCCESS [  0.566 s]
[INFO] sqlserverreader 0.0.1-SNAPSHOT ..................... SUCCESS [  0.644 s]
[INFO] postgresqlreader 0.0.1-SNAPSHOT .................... SUCCESS [  0.585 s]
[INFO] kingbaseesreader 0.0.1-SNAPSHOT .................... SUCCESS [  0.568 s]
[INFO] oraclereader 0.0.1-SNAPSHOT ........................ SUCCESS [  0.582 s]
[INFO] cassandrareader 0.0.1-SNAPSHOT ..................... SUCCESS [  1.134 s]
[INFO] oceanbasev10reader 0.0.1-SNAPSHOT .................. SUCCESS [  0.969 s]
[INFO] rdbmsreader 0.0.1-SNAPSHOT ......................... SUCCESS [  0.694 s]
[INFO] odpsreader 0.0.1-SNAPSHOT .......................... SUCCESS [  1.614 s]
[INFO] otsreader 0.0.1-SNAPSHOT ........................... SUCCESS [  1.390 s]
[INFO] otsstreamreader 0.0.1-SNAPSHOT ..................... SUCCESS [  1.172 s]
[INFO] hbase11xreader 0.0.1-SNAPSHOT ...................... SUCCESS [  3.303 s]
[INFO] hbase094xreader 0.0.1-SNAPSHOT ..................... SUCCESS [  2.317 s]
[INFO] hbase11xsqlreader 0.0.1-SNAPSHOT ................... SUCCESS [  4.120 s]
[INFO] hbase20xsqlreader 0.0.1-SNAPSHOT ................... SUCCESS [  0.616 s]
[INFO] plugin-unstructured-storage-util 0.0.1-SNAPSHOT .... SUCCESS [  0.480 s]
[INFO] hdfsreader 0.0.1-SNAPSHOT .......................... SUCCESS [  4.232 s]
[INFO] ossreader 0.0.1-SNAPSHOT ........................... SUCCESS [  4.550 s]
[INFO] ftpreader 0.0.1-SNAPSHOT ........................... SUCCESS [  2.044 s]
[INFO] txtfilereader 0.0.1-SNAPSHOT ....................... SUCCESS [  1.887 s]
[INFO] streamreader 0.0.1-SNAPSHOT ........................ SUCCESS [  0.481 s]
[INFO] clickhousereader 0.0.1-SNAPSHOT .................... SUCCESS [  0.952 s]
[INFO] mongodbreader 0.0.1-SNAPSHOT ....................... SUCCESS [  1.983 s]
[INFO] tdenginewriter 0.0.1-SNAPSHOT ...................... SUCCESS [  1.949 s]
[INFO] tdenginereader 0.0.1-SNAPSHOT ...................... SUCCESS [  0.881 s]
[INFO] gdbreader 0.0.1-SNAPSHOT ........................... SUCCESS [  1.641 s]
[INFO] tsdbreader 0.0.1-SNAPSHOT .......................... SUCCESS [  0.797 s]
[INFO] opentsdbreader 0.0.1-SNAPSHOT ...................... SUCCESS [  1.155 s]
[INFO] loghubreader 0.0.1-SNAPSHOT ........................ SUCCESS [  0.856 s]
[INFO] datahubreader 0.0.1-SNAPSHOT ....................... SUCCESS [  1.044 s]
[INFO] starrocksreader 0.0.1-SNAPSHOT ..................... SUCCESS [  0.537 s]
[INFO] mysqlwriter 0.0.1-SNAPSHOT ......................... SUCCESS [  0.557 s]
[INFO] starrockswriter 1.1.0 .............................. SUCCESS [  2.075 s]
[INFO] drdswriter 0.0.1-SNAPSHOT .......................... SUCCESS [  0.509 s]
[INFO] databendwriter 0.0.1-SNAPSHOT ...................... SUCCESS [  0.947 s]
[INFO] oraclewriter 0.0.1-SNAPSHOT ........................ SUCCESS [  0.557 s]
[INFO] sqlserverwriter 0.0.1-SNAPSHOT ..................... SUCCESS [  0.565 s]
[INFO] postgresqlwriter 0.0.1-SNAPSHOT .................... SUCCESS [  0.661 s]
[INFO] kingbaseeswriter 0.0.1-SNAPSHOT .................... SUCCESS [  0.567 s]
[INFO] odpswriter 0.0.1-SNAPSHOT .......................... SUCCESS [  1.590 s]
[INFO] adswriter 0.0.1-SNAPSHOT ........................... SUCCESS [  1.804 s]
[INFO] oceanbasev10writer 0.0.1-SNAPSHOT .................. SUCCESS [  0.891 s]
[INFO] adbpgwriter 0.0.1-SNAPSHOT ......................... SUCCESS [  1.058 s]
[INFO] hologresjdbcwriter 0.0.1-SNAPSHOT .................. SUCCESS [  1.014 s]
[INFO] rdbmswriter 0.0.1-SNAPSHOT ......................... SUCCESS [  0.639 s]
[INFO] hdfswriter 0.0.1-SNAPSHOT .......................... SUCCESS [  4.426 s]
[INFO] osswriter 0.0.1-SNAPSHOT ........................... SUCCESS [  4.573 s]
[INFO] otswriter 0.0.1-SNAPSHOT ........................... SUCCESS [  1.490 s]
[INFO] hbase11xwriter 0.0.1-SNAPSHOT ...................... SUCCESS [  2.284 s]
[INFO] hbase094xwriter 0.0.1-SNAPSHOT ..................... SUCCESS [  1.864 s]
[INFO] hbase11xsqlwriter 0.0.1-SNAPSHOT ................... SUCCESS [  3.953 s]
[INFO] hbase20xsqlwriter 0.0.1-SNAPSHOT ................... SUCCESS [  0.623 s]
[INFO] kuduwriter 0.0.1-SNAPSHOT .......................... SUCCESS [  0.697 s]
[INFO] ftpwriter 0.0.1-SNAPSHOT ........................... SUCCESS [  1.880 s]
[INFO] txtfilewriter 0.0.1-SNAPSHOT ....................... SUCCESS [  1.756 s]
[INFO] streamwriter 0.0.1-SNAPSHOT ........................ SUCCESS [  0.508 s]
[INFO] elasticsearchwriter 0.0.1-SNAPSHOT ................. SUCCESS [  1.047 s]
[INFO] mongodbwriter 0.0.1-SNAPSHOT ....................... SUCCESS [  1.808 s]
[INFO] ocswriter 0.0.1-SNAPSHOT ........................... SUCCESS [  0.962 s]
[INFO] tsdbwriter 0.0.1-SNAPSHOT .......................... SUCCESS [  0.800 s]
[INFO] gdbwriter 0.0.1-SNAPSHOT ........................... SUCCESS [  2.024 s]
[INFO] oscarwriter 0.0.1-SNAPSHOT ......................... SUCCESS [  0.562 s]
[INFO] loghubwriter 0.0.1-SNAPSHOT ........................ SUCCESS [  0.741 s]
[INFO] datahubwriter 0.0.1-SNAPSHOT ....................... SUCCESS [  0.998 s]
[INFO] cassandrawriter 0.0.1-SNAPSHOT ..................... SUCCESS [  1.028 s]
[INFO] clickhousewriter 0.0.1-SNAPSHOT .................... SUCCESS [  1.045 s]
[INFO] doriswriter 0.0.1-SNAPSHOT ......................... SUCCESS [  0.921 s]
[INFO] selectdbwriter 0.0.1-SNAPSHOT ...................... SUCCESS [  1.013 s]
[INFO] adbmysqlwriter 0.0.1-SNAPSHOT ...................... SUCCESS [  0.515 s]
[INFO] neo4jwriter 0.0.1-SNAPSHOT ......................... SUCCESS [  1.229 s]
[INFO] gaussdbreader 0.0.1-SNAPSHOT ....................... SUCCESS [  0.549 s]
[INFO] gaussdbwriter 0.0.1-SNAPSHOT ....................... SUCCESS [  0.589 s]
[INFO] datax-example 0.0.1-SNAPSHOT ....................... SUCCESS [  0.002 s]
[INFO] datax-example-core 0.0.1-SNAPSHOT .................. SUCCESS [  0.255 s]
[INFO] datax-example-streamreader 0.0.1-SNAPSHOT .......... SUCCESS [  0.006 s]
[INFO] datax-example-neo4j 0.0.1-SNAPSHOT ................. SUCCESS [  0.006 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  03:34 min
[INFO] Finished at: 2024-07-17T16:47:23+08:00
[INFO] ------------------------------------------------------------------------

打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:

$ cd  {DataX_source_code_home}
$ ls ./target/datax/datax/
bin        conf        job        lib        log        log_perf    plugin

3、示例(MySQL → MySQL)

3.1、准备(MySQL目标表)

create database journey;
use journey;
create table t_ds_user like dolphinscheduler.t_ds_user;

3.2、json准备

DataX 完整 JSON 配置文件示例 :

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "column": ["id", "name", "age"],  // 要读取的列
            "connection": [
              {
                "table": ["source_table"],  // 源表名
                "jdbcUrl": ["jdbc:mysql://localhost:3306/source_db"]  // 源数据库连接 URL
              }
            ]
          }
        },
        "writer": {
          "name": "postgresqlwriter",
          "parameter": {
            "column": [
              {"name": "id", "type": "INTEGER"},   // 目标表的列及类型
              {"name": "name", "type": "VARCHAR"},
              {"name": "age", "type": "INTEGER"}
            ],
            "table": "target_table",  // 目标表名
            "jdbcUrl": "jdbc:postgresql://localhost:5432/target_db",  // 目标数据库连接 URL
            "username": "your_username",  // 目标数据库用户名
            "password": "your_password"   // 目标数据库密码
          }
        },
        "transformer": [
          {
            "name": "columntransformer",
            "parameter": {
              "column": [
                {"name": "age", "type": "INTEGER"}  // 数据转换,例如将 age 列的数据类型转换为 INTEGER
              ]
            }
          }
        ]
      }
    ],
    "setting": {
      "speed": {
        "channel": 5  // 设置并发读取通道数
      },
      "errorLimit": {
        "record": 1000,  // 记录错误的最大数量
        "percentage": 0.02  // 错误记录的最大百分比
      }
    }
  }
}

mysql2mysql.json 如下 :

{
  "job": {
    "content": [{
      "reader": {
        "name": "mysqlreader",
        "parameter": {
          "username": "root",
          "password": "root@123",
          "connection": [{
            "querySql": ["select * from t_ds_user;"],
            "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useSSL=false"]
          }]
        }
      },
      "writer": {
        "name": "mysqlwriter",
        "parameter": {
          "username": "root",
          "password": "root@123",
          "column": ["`id`", "`user_name`", "`user_password`", "`user_type`", "`email`", "`phone`", "`tenant_id`", "`create_time`", "`update_time`", "`queue`", "`state`", "`time_zone`"],
          "connection": [{
            "table": ["t_ds_user"],
            "jdbcUrl": "jdbc:mysql://127.0.0.1:3306/journey?useSSL=false"
          }]
        }
      }
    }],
    "setting": {
      "speed": {
        "channel": 1,
        "record": 1000
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0
      }
    }
  },
  "core": {
    "transport": {
      "channel": {
        "speed": {
          "channel": 1,
          "record": 1000
        }
      }
    }
  }
}

3.3、执行

python3 /Users/qiaozhanwei/IdeaProjects/DataX/target/datax/datax/bin/datax.py /Users/qiaozhanwei/IdeaProjects/DataX/target/mysql2mysql.json

3.4、执行结果

mysql> select * from journey.t_ds_user;
+----+-----------+----------------------------------+-----------+------------+-------+-----------+---------------------+---------------------+-------+-------+-----------+
| id | user_name | user_password                    | user_type | email      | phone | tenant_id | create_time         | update_time         | queue | state | time_zone |
+----+-----------+----------------------------------+-----------+------------+-------+-----------+---------------------+---------------------+-------+-------+-----------+
|  1 | admin     | a0e29abb026840908b372ecdb1231766 |         0 | xxx@qq.com |       |        -1 | 2024-06-19 16:56:24 | 2024-06-19 16:56:24 | NULL  |     1 | NULL      |
+----+-----------+----------------------------------+-----------+------------+-------+-----------+---------------------+---------------------+-------+-------+-----------+
1 row in set (0.00 sec)

mysql> 

如感兴趣,点赞加关注,谢谢!!!


journey
32 声望22 粉丝