1、SeaTunnel部署
1.1、下载包
https://archive.apache.org/dist/seatunnel/2.3.5/apache-seatunnel-2.3.5-bin.tar.gz
1.2、下载插件
注意 : 修改 bin/install-plugin.sh,让它从阿里云仓库下载,快一点。前提自己的mvn是能走阿里云的
SEATUNNEL_HOME=$(cd $(dirname $0);cd ../;pwd)
# connector default version is 2.3.5, you can also choose a custom version. eg: 2.1.2: sh install-plugin.sh 2.1.2
version=2.3.5
if [ -n "$1" ]; then
version="$1"
fi
echo "Install SeaTunnel connectors plugins, usage version is ${version}"
# create the connectors directory
if [ ! -d ${SEATUNNEL_HOME}/connectors ];
then
mkdir ${SEATUNNEL_HOME}/connectors
echo "create connectors directory"
fi
while read line; do
first_char=$(echo "$line" | cut -c 1)
if [ "$first_char" != "-" ] && [ "$first_char" != "#" ] && [ ! -z $first_char ]
then
echo "install connector : " $line
# 修改这里
mvn dependency:get -DgroupId=org.apache.seatunnel -DartifactId=${line} -Dversion=${version} -Ddest=${SEATUNNEL_HOME}/connectors
fi
done < ${SEATUNNEL_HOME}/config/plugin_config
sh bin/install-plugin.sh 2.3.5
开始下载插件,会放入到connectors下
例如 :
/home/seatunnel/connectors
total 1247340
-rw-r--r--. 1 root root 11532131 Jul 18 14:00 connector-amazondynamodb-2.3.5.jar
-rw-r--r--. 1 root root 9013741 Jul 18 14:54 connector-amazonsqs-2.3.5.jar
-rw-r--r--. 1 root root 202593 Jul 18 14:00 connector-assert-2.3.5.jar
-rw-r--r--. 1 root root 13951283 Jul 18 14:04 connector-cassandra-2.3.5.jar
-rw-r--r--. 1 root root 30257530 Jul 18 14:37 connector-cdc-mongodb-2.3.5.jar
-rw-r--r--. 1 root root 30540348 Jul 18 14:10 connector-cdc-mysql-2.3.5.jar
-rw-r--r--. 1 root root 26903195 Jul 18 14:38 connector-cdc-sqlserver-2.3.5.jar
-rw-r--r--. 1 root root 30830325 Jul 18 14:38 connector-clickhouse-2.3.5.jar
-rw-r--r--. 1 root root 77830 Nov 9 2023 connector-console-2.3.5.jar
-rw-r--r--. 1 root root 7103021 Jul 18 14:38 connector-datahub-2.3.5.jar
-rw-r--r--. 1 root root 5600548 Jul 18 14:38 connector-dingtalk-2.3.5.jar
-rw-r--r--. 1 root root 11785663 Jul 18 14:38 connector-doris-2.3.5.jar
-rw-r--r--. 1 root root 19791253 Jul 18 14:54 connector-easysearch-2.3.5.jar
-rw-r--r--. 1 root root 5529299 Jul 18 14:38 connector-elasticsearch-2.3.5.jar
-rw-r--r--. 1 root root 754655 Jul 18 14:38 connector-email-2.3.5.jar
-rw-r--r--. 1 root root 199577 Nov 9 2023 connector-fake-2.3.5.jar
-rw-r--r--. 1 root root 42307844 Jul 18 14:39 connector-file-ftp-2.3.5.jar
-rw-r--r--. 1 root root 42296458 Jul 18 14:40 connector-file-hadoop-2.3.5.jar
-rw-r--r--. 1 root root 41556515 Jul 18 14:41 connector-file-jindo-oss-2.3.5.jar
-rw-r--r--. 1 root root 42291150 Jul 18 14:40 connector-file-local-2.3.5.jar
-rw-r--r--. 1 root root 42293429 Jul 18 14:41 connector-file-oss-2.3.5.jar
-rw-r--r--. 1 root root 45302759 Jul 18 14:42 connector-file-s3-2.3.5.jar
-rw-r--r--. 1 root root 42596484 Jul 18 14:43 connector-file-sftp-2.3.5.jar
-rw-r--r--. 1 root root 46947425 Jul 18 14:44 connector-google-firestore-2.3.5.jar
-rw-r--r--. 1 root root 6891940 Jul 18 14:43 connector-google-sheets-2.3.5.jar
-rw-r--r--. 1 root root 50800910 Jul 18 14:54 connector-hbase-2.3.5.jar
-rw-r--r--. 1 root root 42318873 Jul 18 14:44 connector-hive-2.3.5.jar
-rw-r--r--. 1 root root 5222439 Jul 18 14:44 connector-http-base-2.3.5.jar
-rw-r--r--. 1 root root 5226214 Jul 18 14:44 connector-http-feishu-2.3.5.jar
-rw-r--r--. 1 root root 5231180 Jul 18 14:44 connector-http-github-2.3.5.jar
-rw-r--r--. 1 root root 5230658 Jul 18 14:44 connector-http-gitlab-2.3.5.jar
-rw-r--r--. 1 root root 5229668 Jul 18 14:45 connector-http-jira-2.3.5.jar
-rw-r--r--. 1 root root 5230849 Jul 18 14:45 connector-http-klaviyo-2.3.5.jar
-rw-r--r--. 1 root root 5230472 Jul 18 14:45 connector-http-lemlist-2.3.5.jar
-rw-r--r--. 1 root root 5233337 Jul 18 14:45 connector-http-myhours-2.3.5.jar
-rw-r--r--. 1 root root 5230675 Jul 18 14:45 connector-http-notion-2.3.5.jar
-rw-r--r--. 1 root root 5230728 Jul 18 14:45 connector-http-onesignal-2.3.5.jar
-rw-r--r--. 1 root root 5230081 Jul 18 14:45 connector-http-wechat-2.3.5.jar
-rw-r--r--. 1 root root 157677173 Jul 18 14:47 connector-hudi-2.3.5.jar
-rw-r--r--. 1 root root 30625934 Jul 18 14:48 connector-iceberg-2.3.5.jar
-rw-r--r--. 1 root root 3468674 Jul 18 14:48 connector-influxdb-2.3.5.jar
-rw-r--r--. 1 root root 5804542 Jul 18 14:48 connector-iotdb-2.3.5.jar
-rw-r--r--. 1 root root 776369 Jul 18 14:48 connector-jdbc-2.3.5.jar
-rw-r--r--. 1 root root 17276586 Jul 18 14:48 connector-kafka-2.3.5.jar
-rw-r--r--. 1 root root 28536457 Jul 18 14:48 connector-kudu-2.3.5.jar
-rw-r--r--. 1 root root 23546499 Jul 18 14:49 connector-maxcompute-2.3.5.jar
-rw-r--r--. 1 root root 2480453 Jul 18 14:49 connector-mongodb-2.3.5.jar
-rw-r--r--. 1 root root 5100892 Jul 18 14:49 connector-neo4j-2.3.5.jar
-rw-r--r--. 1 root root 148822493 Jul 18 14:51 connector-openmldb-2.3.5.jar
-rw-r--r--. 1 root root 44265772 Jul 18 14:52 connector-pulsar-2.3.5.jar
-rw-r--r--. 1 root root 830795 Jul 18 14:52 connector-rabbitmq-2.3.5.jar
-rw-r--r--. 1 root root 1372145 Jul 18 14:52 connector-redis-2.3.5.jar
-rw-r--r--. 1 root root 54323057 Jul 18 14:52 connector-s3-redshift-2.3.5.jar
-rw-r--r--. 1 root root 1668609 Jul 18 14:53 connector-selectdb-cloud-2.3.5.jar
-rw-r--r--. 1 root root 649935 Jul 18 14:52 connector-sentry-2.3.5.jar
-rw-r--r--. 1 root root 5955025 Jul 18 14:53 connector-slack-2.3.5.jar
-rw-r--r--. 1 root root 174796 Jul 18 14:53 connector-socket-2.3.5.jar
-rw-r--r--. 1 root root 23322414 Jul 18 14:53 connector-starrocks-2.3.5.jar
-rw-r--r--. 1 root root 10782289 Jul 18 14:53 connector-tablestore-2.3.5.jar
-rw-r--r--. 1 root root 2481560 Jul 18 15:39 mysql-connector-j-8.0.33.jar
-rw-r--r--. 1 root root 5803 Nov 9 2023 plugin-mapping.properties
1.3、放入mysql驱动
/home/seatunnel/lib 下放入
root@xxx lib]# pwd
/home/seatunnel/lib
[root@xxx lib]# ls -l
total 46148
-rw-r--r--. 1 root root 2481560 Jul 18 15:57 mysql-connector-j-8.0.33.jar
-rw-r--r--. 1 root root 43046761 Nov 9 2023 seatunnel-hadoop3-3.1.4-uber.jar
-rw-r--r--. 1 root root 1723052 Nov 9 2023 seatunnel-transforms-v2.jar
/home/seatunnel/plugins/jdbc/lib下放入,参考/home/seatunnel/plugins/README.md :
[root@xxx lib]# pwd
/home/seatunnel/plugins/jdbc/lib
[root@xxx lib]# ls -l
total 2424
-rw-r--r--. 1 root root 2481560 Jul 18 16:00 mysql-connector-j-8.0.33.jar
2、准备表
小编直接使用Sysbench生成表数据
2.1、安装
yum install sysbench
2.2、准备测试数据
源库表准备 :
1、准备源库和表
create database journey;
2、生成数据(100w数据)
sysbench /usr/share/sysbench/oltp_read_write.lua --mysql-host=mysql1地址 --mysql-port=3306 --mysql-user=root --mysql-password=passwd1 --mysql-db=journey --tables=1 --table-size=1000000 prepare
3、如下 :
MariaDB [journey]> select count(*) from journey.sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.130 sec)
目表库表准备 :
1、创建database
create database journey;
2、创建表
use journey;
CREATE TABLE `sbtest1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`k` int(11) NOT NULL DEFAULT 0,
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
注意 : 设置带宽为100 Mbps
3、准备模版文件
设置JVM内存(因为DataX使用的是2GB)JAVA_OPTS="-Xms2G -Xmx2G"
设置parallelism并行度为10
env {
parallelism = 10
job.mode = "BATCH"
}
source {
Jdbc {
url = "jdbc:mysql://xx.xx.xx.xx:3306/journey?serverTimezone=GMT%2b8&useUnicode=true&characterEncoding=UTF-8&rewriteBatchedStatements=true"
driver = "com.mysql.cj.jdbc.Driver"
connection_check_timeout_sec = 100
user = "user"
password = "passwrd"
table_path = "journey.sbtest1"
query = "select * from journey.sbtest1"
partition_column = "id"
parallelism = 10
}
}
transform {
}
sink {
jdbc {
url = "jdbc:mysql://xx.xx.xx.xx:3331/journey?useSSL=false"
driver = "com.mysql.cj.jdbc.Driver"
user = "user"
password = "password"
query = "insert into sbtest1(id,k,c,pad) values(?,?,?,?)"
}
}
4、执行
[root@xxx seatunnel]# pwd
/home/seatunnel
[root@xxx seatunnel]# ./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local
5、结果
......
2024-07-18 17:14:12,820 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-30] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=11, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,820 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-41] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=10, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-30] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=4, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-40] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=5, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-39] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=6, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-37] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=2, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-38] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=3, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-29] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=1, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-32] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=7, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-26] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=9, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,821 INFO [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-35] - received slot release request, jobID: 866245144175706113, slot: SlotProfile{worker=[localhost]:5801, slotID=8, ownerJobID=866245144175706113, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='2e0c070b-406c-4f74-a608-d17f29ada685'}
2024-07-18 17:14:12,822 INFO [o.a.s.e.s.d.p.SubPlan ] [seatunnel-coordinator-service-6] - Job SeaTunnel_Job (866245144175706113), Pipeline: [(1/1)] state process is stop
2024-07-18 17:14:12,822 INFO [o.a.s.e.s.d.p.PhysicalPlan ] [seatunnel-coordinator-service-5] - Job SeaTunnel_Job (866245144175706113), Pipeline: [(1/1)] future complete with state FINISHED
2024-07-18 17:14:12,822 INFO [o.a.s.e.s.d.p.PhysicalPlan ] [seatunnel-coordinator-service-5] - Job SeaTunnel_Job (866245144175706113) turned from state RUNNING to FINISHED.
2024-07-18 17:14:12,822 INFO [o.a.s.e.s.d.p.PhysicalPlan ] [seatunnel-coordinator-service-5] - Job SeaTunnel_Job (866245144175706113) state process is stop
2024-07-18 17:14:12,841 INFO [o.a.s.e.c.j.ClientJobProxy ] [main] - Job (866245144175706113) end with state FINISHED
2024-07-18 17:14:12,845 INFO [s.c.s.s.c.ClientExecuteCommand] [main] -
***********************************************
Job Statistic Information
***********************************************
Start Time : 2024-07-18 17:08:56
End Time : 2024-07-18 17:14:12
Total Time(s) : 316
Total Read Count : 1000000
Total Write Count : 1000000
Total Failed Count : 0
***********************************************
......
100w数据需要316s
6、DataX
JSON模版如下 :
{
"job": {
"setting": {
"speed": {
"channel": 10
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "xxx",
"column": ["*"],
"splitPk": "id",
"connection": [
{
"table": ["sbtest1"],
"jdbcUrl": ["jdbc:mysql://xxx.xxx.xxx.xxx:3306/journey?useSSL=false"]
}
]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"username": "qiaozhanwei",
"password": "xxx",
"column": ["*"],
"connection": [
{
"table": ["sbtest1"],
"jdbcUrl": "jdbc:mysql://xxx.xxx.xxx.xxx:3331/journey?useSSL=false"
}
]
}
}
}
]
}
}
运行结果 :
2024-07-18 11:17:59.178 [job-0] INFO JobContainer - PerfTrace not enable!
2024-07-18 11:17:59.178 [job-0] INFO StandAloneJobContainerCommunicator - Total 1000000 records, 189888896 bytes | Speed 9.05MB/s, 50000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 130.547s | All Task WaitReaderTime 6.585s | Percentage 100.00%
2024-07-18 11:17:59.179 [job-0] INFO JobContainer -
任务启动时刻 : 2024-07-18 11:17:38
任务结束时刻 : 2024-07-18 11:17:59
任务总计耗时 : 20s
任务平均流量 : 9.05MB/s
记录写入速度 : 50000rec/s
读出记录总数 : 1000000
读写失败总数 : 0
100w数据只需要20s
如感兴趣,点赞加关注,谢谢!!!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。