1、SeaTunnel部署

1.1、下载包

https://archive.apache.org/dist/seatunnel/2.3.5/apache-seatunnel-2.3.5-bin.tar.gz

1.2、下载插件

注意 : 修改 bin/install-plugin.sh,让它从阿里云仓库下载,快一点。前提自己的mvn是能走阿里云的

SEATUNNEL_HOME=$(cd $(dirname $0);cd ../;pwd)

# connector default version is 2.3.5, you can also choose a custom version. eg: 2.1.2:  sh install-plugin.sh 2.1.2
version=2.3.5

if [ -n "$1" ]; then
    version="$1"
fi

echo "Install SeaTunnel connectors plugins, usage version is ${version}"

# create the connectors directory
if [ ! -d ${SEATUNNEL_HOME}/connectors ];
  then
      mkdir ${SEATUNNEL_HOME}/connectors
      echo "create connectors directory"
fi

while read line; do
    first_char=$(echo "$line" | cut -c 1)

    if [ "$first_char" != "-" ] && [ "$first_char" != "#" ] && [ ! -z $first_char ]
        then
                echo "install connector : " $line
                # 修改这里
                mvn dependency:get -DgroupId=org.apache.seatunnel -DartifactId=${line} -Dversion=${version} -Ddest=${SEATUNNEL_HOME}/connectors
    fi

done < ${SEATUNNEL_HOME}/config/plugin_config

sh bin/install-plugin.sh 2.3.5
开始下载插件,会放入到connectors下

例如 :

/home/seatunnel/connectors

total 1247340
-rw-r--r--. 1 root root  11532131 Jul 18 14:00 connector-amazondynamodb-2.3.5.jar
-rw-r--r--. 1 root root   9013741 Jul 18 14:54 connector-amazonsqs-2.3.5.jar
-rw-r--r--. 1 root root    202593 Jul 18 14:00 connector-assert-2.3.5.jar
-rw-r--r--. 1 root root  13951283 Jul 18 14:04 connector-cassandra-2.3.5.jar
-rw-r--r--. 1 root root  30257530 Jul 18 14:37 connector-cdc-mongodb-2.3.5.jar
-rw-r--r--. 1 root root  30540348 Jul 18 14:10 connector-cdc-mysql-2.3.5.jar
-rw-r--r--. 1 root root  26903195 Jul 18 14:38 connector-cdc-sqlserver-2.3.5.jar
-rw-r--r--. 1 root root  30830325 Jul 18 14:38 connector-clickhouse-2.3.5.jar
-rw-r--r--. 1 root root     77830 Nov  9  2023 connector-console-2.3.5.jar
-rw-r--r--. 1 root root   7103021 Jul 18 14:38 connector-datahub-2.3.5.jar
-rw-r--r--. 1 root root   5600548 Jul 18 14:38 connector-dingtalk-2.3.5.jar
-rw-r--r--. 1 root root  11785663 Jul 18 14:38 connector-doris-2.3.5.jar
-rw-r--r--. 1 root root  19791253 Jul 18 14:54 connector-easysearch-2.3.5.jar
-rw-r--r--. 1 root root   5529299 Jul 18 14:38 connector-elasticsearch-2.3.5.jar
-rw-r--r--. 1 root root    754655 Jul 18 14:38 connector-email-2.3.5.jar
-rw-r--r--. 1 root root    199577 Nov  9  2023 connector-fake-2.3.5.jar
-rw-r--r--. 1 root root  42307844 Jul 18 14:39 connector-file-ftp-2.3.5.jar
-rw-r--r--. 1 root root  42296458 Jul 18 14:40 connector-file-hadoop-2.3.5.jar
-rw-r--r--. 1 root root  41556515 Jul 18 14:41 connector-file-jindo-oss-2.3.5.jar
-rw-r--r--. 1 root root  42291150 Jul 18 14:40 connector-file-local-2.3.5.jar
-rw-r--r--. 1 root root  42293429 Jul 18 14:41 connector-file-oss-2.3.5.jar
-rw-r--r--. 1 root root  45302759 Jul 18 14:42 connector-file-s3-2.3.5.jar
-rw-r--r--. 1 root root  42596484 Jul 18 14:43 connector-file-sftp-2.3.5.jar
-rw-r--r--. 1 root root  46947425 Jul 18 14:44 connector-google-firestore-2.3.5.jar
-rw-r--r--. 1 root root   6891940 Jul 18 14:43 connector-google-sheets-2.3.5.jar
-rw-r--r--. 1 root root  50800910 Jul 18 14:54 connector-hbase-2.3.5.jar
-rw-r--r--. 1 root root  42318873 Jul 18 14:44 connector-hive-2.3.5.jar
-rw-r--r--. 1 root root   5222439 Jul 18 14:44 connector-http-base-2.3.5.jar
-rw-r--r--. 1 root root   5226214 Jul 18 14:44 connector-http-feishu-2.3.5.jar
-rw-r--r--. 1 root root   5231180 Jul 18 14:44 connector-http-github-2.3.5.jar
-rw-r--r--. 1 root root   5230658 Jul 18 14:44 connector-http-gitlab-2.3.5.jar
-rw-r--r--. 1 root root   5229668 Jul 18 14:45 connector-http-jira-2.3.5.jar
-rw-r--r--. 1 root root   5230849 Jul 18 14:45 connector-http-klaviyo-2.3.5.jar
-rw-r--r--. 1 root root   5230472 Jul 18 14:45 connector-http-lemlist-2.3.5.jar
-rw-r--r--. 1 root root   5233337 Jul 18 14:45 connector-http-myhours-2.3.5.jar
-rw-r--r--. 1 root root   5230675 Jul 18 14:45 connector-http-notion-2.3.5.jar
-rw-r--r--. 1 root root   5230728 Jul 18 14:45 connector-http-onesignal-2.3.5.jar
-rw-r--r--. 1 root root   5230081 Jul 18 14:45 connector-http-wechat-2.3.5.jar
-rw-r--r--. 1 root root 157677173 Jul 18 14:47 connector-hudi-2.3.5.jar
-rw-r--r--. 1 root root  30625934 Jul 18 14:48 connector-iceberg-2.3.5.jar
-rw-r--r--. 1 root root   3468674 Jul 18 14:48 connector-influxdb-2.3.5.jar
-rw-r--r--. 1 root root   5804542 Jul 18 14:48 connector-iotdb-2.3.5.jar
-rw-r--r--. 1 root root    776369 Jul 18 14:48 connector-jdbc-2.3.5.jar
-rw-r--r--. 1 root root  17276586 Jul 18 14:48 connector-kafka-2.3.5.jar
-rw-r--r--. 1 root root  28536457 Jul 18 14:48 connector-kudu-2.3.5.jar
-rw-r--r--. 1 root root  23546499 Jul 18 14:49 connector-maxcompute-2.3.5.jar
-rw-r--r--. 1 root root   2480453 Jul 18 14:49 connector-mongodb-2.3.5.jar
-rw-r--r--. 1 root root   5100892 Jul 18 14:49 connector-neo4j-2.3.5.jar
-rw-r--r--. 1 root root 148822493 Jul 18 14:51 connector-openmldb-2.3.5.jar
-rw-r--r--. 1 root root  44265772 Jul 18 14:52 connector-pulsar-2.3.5.jar
-rw-r--r--. 1 root root    830795 Jul 18 14:52 connector-rabbitmq-2.3.5.jar
-rw-r--r--. 1 root root   1372145 Jul 18 14:52 connector-redis-2.3.5.jar
-rw-r--r--. 1 root root  54323057 Jul 18 14:52 connector-s3-redshift-2.3.5.jar
-rw-r--r--. 1 root root   1668609 Jul 18 14:53 connector-selectdb-cloud-2.3.5.jar
-rw-r--r--. 1 root root    649935 Jul 18 14:52 connector-sentry-2.3.5.jar
-rw-r--r--. 1 root root   5955025 Jul 18 14:53 connector-slack-2.3.5.jar
-rw-r--r--. 1 root root    174796 Jul 18 14:53 connector-socket-2.3.5.jar
-rw-r--r--. 1 root root  23322414 Jul 18 14:53 connector-starrocks-2.3.5.jar
-rw-r--r--. 1 root root  10782289 Jul 18 14:53 connector-tablestore-2.3.5.jar
-rw-r--r--. 1 root root   2481560 Jul 18 15:39 mysql-connector-j-8.0.33.jar
-rw-r--r--. 1 root root      5803 Nov  9  2023 plugin-mapping.properties

1.3、放入mysql驱动

/home/seatunnel/lib 下放入

root@xxx lib]# pwd
/home/seatunnel/lib
[root@xxx lib]# ls -l
total 46148
-rw-r--r--. 1 root root  2481560 Jul 18 15:57 mysql-connector-j-8.0.33.jar
-rw-r--r--. 1 root root 43046761 Nov  9  2023 seatunnel-hadoop3-3.1.4-uber.jar
-rw-r--r--. 1 root root  1723052 Nov  9  2023 seatunnel-transforms-v2.jar

/home/seatunnel/plugins/jdbc/lib下放入,参考/home/seatunnel/plugins/README.md :

[root@xxx lib]# pwd
/home/seatunnel/plugins/jdbc/lib
[root@xxx lib]# ls -l
total 2424
-rw-r--r--. 1 root root 2481560 Jul 18 16:00 mysql-connector-j-8.0.33.jar

2、准备表

小编直接使用Sysbench生成表数据

2.1、安装

yum install sysbench

2.2、准备测试数据

源库表准备 :

1、准备源库和表
create database journey;

2、生成数据(100w数据)
sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=mysql1地址 --mysql-port=3306 --mysql-user=root --mysql-password=passwd1 --mysql-db=journey --tables=1 --table-size=1000000 prepare

3、如下 :
MariaDB [journey]> select count(*) from journey.sbtest1;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.130 sec)

目表库表准备 :

1、创建database
create database journey;

2、创建表
use journey;
CREATE TABLE `sbtest1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT 0,
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

注意 : 设置带宽为100 Mbps

3、准备模版文件

设置JVM内存(因为DataX使用的是2GB)
JAVA_OPTS="-Xms2G -Xmx2G"
设置parallelism并行度为10

env {
  parallelism = 10
  job.mode = "BATCH"
}

source {
    Jdbc {
        url = "jdbc:mysql://xx.xx.xx.xx:3306/journey?serverTimezone=GMT%2b8&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
        driver = "com.mysql.cj.jdbc.Driver"
        connection_check_timeout_sec = 100
        user = "user"
        password = "passwrd"
        table_path = "journey.sbtest1"
        query = "select * from journey.sbtest1"
        partition_column = "id"
        parallelism = 10
    }
}

transform {

}

sink {
    jdbc {
        url = "jdbc:mysql://xx.xx.xx.xx:3331/journey?useSSL=false"
        driver = "com.mysql.cj.jdbc.Driver"
        user = "user"
        password = "password"
        query = "insert into sbtest1(id,k,c,pad) values(?,?,?,?)"
      }
}

4、执行

[root@xxx seatunnel]# pwd
/home/seatunnel
[root@xxx seatunnel]# ./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local

5、结果

......
2024-07-18 17:14:12,820 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-30] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=11, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,820 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-41] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=10, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-30] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=4, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-40] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=5, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-39] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=6, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-37] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=2, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-38] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=3, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-29] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=1, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-32] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=7, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-26] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=9, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-35] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=8, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,822 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-6] - Job SeaTunnel_Job (866245144175706113), Pipeline: [(1/1)] state process is stop
2024-07-18 17:14:12,822 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-5] - Job SeaTunnel_Job (866245144175706113), Pipeline: [(1/1)] future complete with state FINISHED
2024-07-18 17:14:12,822 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-5] - Job SeaTunnel_Job (866245144175706113) turned from state RUNNING to FINISHED.
2024-07-18 17:14:12,822 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-5] - Job SeaTunnel_Job (866245144175706113) state process is stop
2024-07-18 17:14:12,841 INFO  [o.a.s.e.c.j.ClientJobProxy    ] [main] - Job (866245144175706113) end with state FINISHED
2024-07-18 17:14:12,845 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
***********************************************
           Job Statistic Information
***********************************************
Start Time                : 2024-07-18 17:08:56
End Time                  : 2024-07-18 17:14:12
Total Time(s)             :                 316
Total Read Count          :             1000000
Total Write Count         :             1000000
Total Failed Count        :                   0
***********************************************
......

100w数据需要316s

6、DataX

image.png

JSON模版如下 :

{
  "job": {
    "setting": {
      "speed": {
        "channel": 10
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "xxx",
            "column": ["*"],
            "splitPk": "id",
            "connection": [
              {
                "table": ["sbtest1"],
                "jdbcUrl": ["jdbc:mysql://xxx.xxx.xxx.xxx:3306/journey?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "qiaozhanwei",
            "password": "xxx",
            "column": ["*"],
            "connection": [
              {
                "table": ["sbtest1"],
                "jdbcUrl": "jdbc:mysql://xxx.xxx.xxx.xxx:3331/journey?useSSL=false"
              }
            ]
          }
        }
      }
    ]
  }
}

运行结果 :

2024-07-18 11:17:59.178 [job-0] INFO  JobContainer - PerfTrace not enable!
    2024-07-18 11:17:59.178 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1000000 records, 189888896 bytes | Speed 9.05MB/s, 50000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 130.547s |  All Task WaitReaderTime 6.585s | Percentage 100.00%
    2024-07-18 11:17:59.179 [job-0] INFO  JobContainer - 
    任务启动时刻                    : 2024-07-18 11:17:38
    任务结束时刻                    : 2024-07-18 11:17:59
    任务总计耗时                    :                 20s
    任务平均流量                    :            9.05MB/s
    记录写入速度                    :          50000rec/s
    读出记录总数                    :             1000000
    读写失败总数                    :                   0

100w数据只需要20s

如感兴趣,点赞加关注,谢谢!!!


journey
32 声望20 粉丝