本文内容来自YashanDB官网,原文内容请见 https://www.yashandb.com/newsinfo/7396987.html?templateId=171...
概述
由于YMP不支持Latin1字符集MySQL数据库,导致MySQL的中文数据迁移到YashanDB,变成乱码。因此使用Kettle迁移可以规避该问题。
因此本文介绍了两种环境的Kettle使用方式进行数据迁移。这两种环境分别是Windows环境和Linux环境:Windows环境可以使用图形界面,便于调试;Linux环境一般处于源或者目标数据库直接相连的网络,所以网络性能最佳。请根据自己的环境和业务需要,使用最佳执行方式进行数据迁移。
环境
Kettle版本:8.3
JAVA版本:1.8
源MySQL:版本5.7,字符集Latin1
目标YashanDB:23.2.1.100
执行方式
Kettle既可以在Windows执行,也可以在Linux执行,请根据自己的环境和业务需要,使用最佳执行方式进行数据迁移。
Windows执行Kettle
1、确保Windows当前JAVA环境是JAVA 1.8。如果当前Windows环境存在多个JAVA版本,而默认JAVA环境不是JAVA 1.8,则可以通过Windows环境变量保证Kettle使用JAVA 1.8,例如示例:
PENTAHO\_JAVA设置为C:\Program Files\Java\jre-1.8\bin\java.exe
PENTAHO\_JAVA\_HOME设置为C:\Program Files\Java\jre-1.8
2、解压作业zip包 - job\_kettle\_MySQL2YashanDB.zip,放置在Kettle所在目录。
3、在Kettle所在目录运行Spoon.bat,启动图形操作界面。
4、打开syncData\_MySQL\_YashanDB.kjb,这个任务是总任务,它封装了多个子任务用于从mysql迁移数据到崖山。
5、打开getDatas\_MySQL\_YashanDB,这个任务是实际执行从mysql迁移数据到崖山的任务,封装了DB连接,需要根据实际环境进行调整和测试,确保后续配置DB连接MySQLInput和YashanOutput的步骤都测试成功。
6、修改getDatas\_MySQL\_YashanDB的DB连接MySQLInput,连接类型使用MySQL,然后输入MySQL的主机地址、数据库名字、端口号、用户名和密码,点击“测试”,测试通过后选择“确认”。
7、修改getDatas\_MySQL\_YashanDB的DB连接YashanOutput,连接类型使用Generic database,连接方式使用Native(JDBC),设置自定义JDBC URL(jdbc:yasdb://YashanDB\_IP:YashanDB\_PORT/YashanDB\_User)和JDBC驱动类名称,并输入用户名和密码,点击“测试”,测试通过后选择“确认”。
8、修改DB连接MySQLInput和YashanOutput后,getDatas\_MySQL\_YashanDB配置处于未保存状态,请务必对该任务进行保存。
9、修改tablelist\_MySQL\_YashanDB.csv,这个配置文件用于指定mysql需要迁移的库表名,注意这里需要和mysql的库表大小写保持一致,例如:
owner,table\_name
test,sys\_log
注意:1) owner这一列对应于mysql的database;2) tablelist\_MySQL\_YashanDB.csv最后一行不能为空行。
10、切换回syncData\_MySQL\_YashanDB.kjb,执行运行即可。
11、确保数据迁移作业成功完成,如果迁移失败,可查看运行日志,定位原因并修复问题,然后重试迁移即可(注意:每次同步之前会将目标库中目标表truncate,所以保证重试不会存在数据重复或者冲突的问题)。
Linux执行Kettle
1、确保Linux当前JAVA环境是JAVA 1.8。
[yashan@mysql57 ~]$ java -version
java version "1.8.0_381"
Java(TM) SE Runtime Environment (build 1.8.0_381-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.381-b09, mixed mode)
2、解压作业zip包 - job\_kettle\_MySQL2YashanDB.zip,放置在Kettle所在目录。
3、syncData\_MySQL\_YashanDB.kjb是总任务,它封装了多个子任务用于从mysql迁移数据到崖山。
4、getDatas\_MySQL\_YashanDB.ktr是实际执行从mysql迁移数据到崖山的任务,封装了DB连接,需要根据实际环境调整DB连接MySQLInput和YashanOutput。
5、修改getDatas\_MySQL\_YashanDB.ktr的DB连接MySQLInput配置。
例如,根据MySQL的连接mysql -h 127.0.0.1 -P 3306 -u root -pwelcome1 -D test
[yashan@mysql57 ~]$ mysql -h 127.0.0.1 -P 3306 -u root -pwelcome1 -D test
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 18
Server version: 5.7.41 MySQL Community Server (GPL)
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> exit;
Bye
[yashan@mysql57 ~]$
调整MySQLInput的主机地址、数据库名字、端口号、用户名和密码:
XML
<connection>
<name>MySQLInput</name>
<server>127.0.0.1</server>
<type>MYSQL</type>
<access>Native</access>
<database>test</database>
<port>3306</port>
<username>root</username>
<password>welcome1</password>
<servername/>
<data_tablespace/>
<index_tablespace/>
<attributes>
<attribute>
<code>FORCE_IDENTIFIERS_TO_LOWERCASE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>FORCE_IDENTIFIERS_TO_UPPERCASE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>IS_CLUSTERED</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>PORT_NUMBER</code>
<attribute>3306</attribute>
</attribute>
<attribute>
<code>PRESERVE_RESERVED_WORD_CASE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>QUOTE_ALL_FIELDS</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>STREAM_RESULTS</code>
<attribute>Y</attribute>
</attribute>
<attribute>
<code>SUPPORTS_BOOLEAN_DATA_TYPE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>SUPPORTS_TIMESTAMP_DATA_TYPE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>USE_POOLING</code>
<attribute>N</attribute>
</attribute>
</attributes>
</connection>
6、修改getDatas\_MySQL\_YashanDB.ktr的DB连接YashanOutput配置。
例如,根据YashanDB的连接yasql test/yasdb\_123@127.0.0.1:1688
[yashan@mysql57 ~]$ yasql test/yasdb_123@127.0.0.1:1688
YashanDB SQL Enterprise Edition Release 23.2.1.100 x86_64
Connected to:
YashanDB Server Enterprise Edition Release 23.2.1.100 x86_64 - X86 64bit Linux
SQL> exit;
[yashan@mysql57 ~]$
调整YashanOutput的下列参数:
连接类型<type>GENERIC</type>
连接方式<access>Native</access>
端口号<port>1521</port>
用户名<username>test</username>
密码<password>yasdb\_123</password>
属性-自定义JDBC连接串<attribute>jdbc:yasdb://127.0.0.1:1688/test</attribute>
属性-自定义JDBC驱动类名称<attribute>com.yashandb.jdbc.Driver</attribute>
属性-数据库方言<attribute>Generic database</attribute>
属性-端口<attribute>1688</attribute>
注意:JDBC连接串和类型请参考yashanDB官方文档:YashanDB JDBC驱动使用介绍
XML
<connection>
<name>YashanOutput</name>
<server/>
<type>GENERIC</type>
<access>Native</access>
<database/>
<port>1521</port>
<username>test</username>
<password>yasdb_123</password>
<servername/>
<data_tablespace/>
<index_tablespace/>
<attributes>
<attribute>
<code>CUSTOM_DRIVER_CLASS</code>
<attribute>com.yashandb.jdbc.Driver</attribute>
</attribute>
<attribute>
<code>CUSTOM_URL</code>
<attribute>jdbc:yasdb://127.0.0.1:1688/test</attribute>
</attribute>
<attribute>
<code>DATABASE_DIALECT_ID</code>
<attribute>Generic database</attribute>
</attribute>
<attribute>
<code>FORCE_IDENTIFIERS_TO_LOWERCASE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>FORCE_IDENTIFIERS_TO_UPPERCASE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>INITIAL_POOL_SIZE</code>
<attribute>20</attribute>
</attribute>
<attribute>
<code>IS_CLUSTERED</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>MAXIMUM_POOL_SIZE</code>
<attribute>20</attribute>
</attribute>
<attribute>
<code>PORT_NUMBER</code>
<attribute>1688</attribute>
</attribute>
<attribute>
<code>PRESERVE_RESERVED_WORD_CASE</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>QUOTE_ALL_FIELDS</code>
<attribute>N</attribute>
</attribute>
<attribute>
<code>SUPPORTS_BOOLEAN_DATA_TYPE</code>
<attribute>Y</attribute>
</attribute>
<attribute>
<code>SUPPORTS_TIMESTAMP_DATA_TYPE</code>
<attribute>Y</attribute>
</attribute>
<attribute>
<code>USE_POOLING</code>
<attribute>Y</attribute>
</attribute>
</attributes>
</connection>
7、修改tablelist\_MySQL\_YashanDB.csv,这个配置文件用于指定mysql需要迁移的库表名,注意这里需要和mysql的库表大小写保持一致,例如:
owner,table\_name
test,sys\_log
注意:1) owner这一列对应于mysql的database;2) tablelist\_MySQL\_YashanDB.csv最后一行不能为空行。
8、运行以下命令即可。
Chmod 755 *.sh
./kitchen.sh -file=syncData\_MySQL\_YashanDB.kjb
9、确保数据迁移作业成功完成,如果迁移失败,可查看运行日志,定位原因并修复问题,然后重试迁移即可(注意:每次同步之前会将目标库中目标表truncate,所以保证重试不会存在数据重复或者冲突的问题)。
Bash
[yashan@mysql57 data-integration]$ ./kitchen.sh -file=syncData_MySQL_YashanDB.kjb
#######################################################################
WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable
Consider installing the package with apt-get or yum.
e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
12:34:56,842 INFO [KarafBoot] Checking to see if org.pentaho.clean.karaf.cache is enabled
12:34:56,898 INFO [KarafInstance]
*******************************************************************************
*** Karaf Instance Number: 1 at /home/yashan/Kettle8/data-integration/./sys ***
*** tem/karaf/caches/kitchen/data-1 ***
*** FastBin Provider Port:52901 ***
*** Karaf Port:8802 ***
*** OSGI Service Port:9051 ***
*******************************************************************************
Jun 01, 2024 12:34:57 PM org.apache.karaf.main.Main$KarafLockCallback lockAquired
INFO: Lock acquired. Setting startlevel to 100
2024/06/01 12:34:57 - Kitchen - Start of run.
D:\poc\data-integration2\system\karaf/deploy does not exist, please create it.
Root path does not exist: /home/yashan/Kettle8/data-integration/D:\poc\data-integration2\system\karaf/deploy
*ERROR* [org.osgi.service.cm.ManagedService, id=255, bundle=52/mvn:org.apache.aries.transaction/org.apache.aries.transaction.manager/1.1.1]: Unexpected problem updating configuration org.apache.aries.transaction
java.lang.ExceptionInInitializerError
at org.apache.aries.transaction.internal.TransactionManagerService.<init>(TransactionManagerService.java:114)
at org.apache.aries.transaction.internal.Activator.updated(Activator.java:63)
at org.apache.felix.cm.impl.helper.ManagedServiceTracker.updateService(ManagedServiceTracker.java:148)
at org.apache.felix.cm.impl.helper.ManagedServiceTracker.provideConfiguration(ManagedServiceTracker.java:81)
at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.provide(ConfigurationManager.java:1448)
at org.apache.felix.cm.impl.ConfigurationManager$ManagedServiceUpdate.run(ConfigurationManager.java:1404)
at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:103)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.MissingResourceException: Can't find bundle for base name org.apache.aries.transaction.txManager, locale en_US
at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1581)
at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1396)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:1091)
at org.apache.aries.util.nls.MessageUtil.createMessageUtil(MessageUtil.java:152)
at org.apache.aries.util.nls.MessageUtil.createMessageUtil(MessageUtil.java:107)
at org.apache.aries.transaction.internal.NLS.<clinit>(NLS.java:25)
... 8 more
2024-06-01 12:34:59.079:INFO:oejs.Server:jetty-8.1.15.v20140411
2024-06-01 12:34:59.111:INFO:oejs.AbstractConnector:Started NIOSocketConnectorWrapper@0.0.0.0:9051
Jun 01, 2024 12:34:59 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-management (182) [org.apache.cxf.management.InstrumentationManager]
Jun 01, 2024 12:34:59 PM org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
INFO: Adding the extensions from bundle org.apache.cxf.cxf-rt-transports-http (183) [org.apache.cxf.transport.http.HTTPTransportFactory, org.apache.cxf.transport.http.HTTPWSDLExtensionLoader, org.apache.cxf.transport.http.policy.HTTPClientAssertionBuilder, org.apache.cxf.transport.http.policy.HTTPServerAssertionBuilder, org.apache.cxf.transport.http.policy.NoOpPolicyInterceptorProvider]
Jun 01, 2024 12:34:59 PM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess
INFO: New Caching Service registered
Jun 01, 2024 12:34:59 PM org.pentaho.caching.impl.PentahoCacheManagerFactory$RegistrationHandler$1 onSuccess
INFO: New Caching Service registered
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Start of job execution
2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Starting entry [获取同步表列表]
2024/06/01 12:35:01 - 获取同步表列表 - Using run configuration [Pentaho local]
2024/06/01 12:35:01 - 获取同步表列表 - Using legacy execution engine
2024/06/01 12:35:01 - getTables_MySQL_YashanDB - Dispatching started for transformation [getTables_MySQL_YashanDB]
2024/06/01 12:35:01 - CSV文件输入.0 - Header row skipped in file 'file:///home/yashan/Kettle8/data-integration\tablelist_MySQL_YashanDB.csv'
2024/06/01 12:35:01 - CSV文件输入.0 - Finished processing (I=2, O=0, R=0, W=1, U=0, E=0)
2024/06/01 12:35:01 - 字段选择.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2024/06/01 12:35:01 - 复制记录到结果.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2024/06/01 12:35:01 - syncData_MySQL_YashanDB - Starting entry [执行同步]
2024/06/01 12:35:01 - 执行同步 - Using run configuration [Pentaho local]
2024/06/01 12:35:01 - jobDatas_MySQL_YashanDB - Starting entry [获取表变量]
2024/06/01 12:35:01 - 获取表变量 - Using run configuration [Pentaho local]
2024/06/01 12:35:01 - 获取表变量 - Using legacy execution engine
2024/06/01 12:35:01 - getResults - Dispatching started for transformation [getResults]
2024/06/01 12:35:01 - 从结果获取记录.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2024/06/01 12:35:01 - 设置变量.0 - Setting environment variables...
2024/06/01 12:35:01 - 设置变量.0 - Set variable OWNER to value [test]
2024/06/01 12:35:01 - 设置变量.0 - Set variable TABLE_NAME to value [sys_log]
2024/06/01 12:35:01 - 设置变量.0 - Finished after 1 rows.
2024/06/01 12:35:01 - 设置变量.0 - Finished processing (I=0, O=0, R=1, W=1, U=0, E=0)
2024/06/01 12:35:01 - jobDatas_MySQL_YashanDB - Starting entry [写入数据到目标表]
2024/06/01 12:35:02 - 写入数据到目标表 - Using run configuration [Pentaho local]
2024/06/01 12:35:02 - 写入数据到目标表 - Using legacy execution engine
2024/06/01 12:35:02 - getDatas_MySQL_YashanDB - Dispatching started for transformation [getDatas_MySQL_YashanDB]
2024/06/01 12:35:02 - YashanOutput - Creating database connection pool for 'YashanOutput'...
2024/06/01 12:35:02 - YashanOutput - Successfully created database connection pool for 'YashanOutput'
2024/06/01 12:35:02 - 表输出.0 - Connected to database [YashanOutput] (commit=20000)
2024/06/01 12:35:03 - 执行SQL脚本.0 - Finished reading query, closing connection.
2024/06/01 12:35:03 - 执行SQL脚本.0 - Finished processing (I=0, O=0, R=0, W=1, U=0, E=0)
2024/06/01 12:35:03 - 表输入.0 - Finished reading query, closing connection.
2024/06/01 12:35:03 - 表输入.0 - Finished processing (I=9, O=0, R=0, W=9, U=0, E=0)
2024/06/01 12:35:03 - 表输出.0 - Finished processing (I=0, O=9, R=9, W=9, U=0, E=0)
2024/06/01 12:35:03 - jobDatas_MySQL_YashanDB - Finished job entry [写入数据到目标表] (result=[true])
2024/06/01 12:35:03 - jobDatas_MySQL_YashanDB - Finished job entry [获取表变量] (result=[true])
2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Starting entry [成功]
2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [成功] (result=[true])
2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [执行同步] (result=[true])
2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Finished job entry [获取同步表列表] (result=[true])
2024/06/01 12:35:03 - syncData_MySQL_YashanDB - Job execution finished
2024/06/01 12:35:03 - Kitchen - Finished!
2024/06/01 12:35:03 - Kitchen - Start=2024/06/01 12:34:57.418, Stop=2024/06/01 12:35:03.414
2024/06/01 12:35:03 - Kitchen - Processing ended after 5 seconds.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。