一、DataX Json配置(样例)
{
"job": {
"setting": {
"speed": {
"channel": 3,
"byte": 1048576
},
"errorLimit": {
"record": 0,
"percentage": 0.02
}
},
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"hadoopConfig": {
"dfs.nameservices": "nameservice1",
"dfs.ha.namenodes.nameservice1": "cdh201.qq.org,cdh202.qq.org",
"dfs.namenode.rpc-address.nameservice1.cdh201.qq.org": "cdh201.qq.org:8020",
"dfs.namenode.rpc-address.nameservice1.cdh202.qq.org": "cdh202.qq.org:8020",
"dfs.client.failover.proxy.provider.nameservice1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
},
"path": "/user/gsbdc/dbdatas/olsd/bns/gsods_rpt_qq/poi/p_data_day=2018-05-14/*",
"haveKerberos": "true",
"kerberosPrincipal": "bi@qq.ORG",
"defaultFS": "hdfs://nameservice1",
"kerberosKeytabFilePath": "/app/soft/datax/job/bi.keytab",
"fileType": "text",
"fieldDelimiter": "\u0001",
"column": [
{
"index": "0",
"type": "string"
},
{
"index": "1",
"type": "string"
},
{
"index": "2",
"type": "string"
},
{
"index": "3",
"type": "string"
},
{
"index": "4",
"type": "string"
},
{
"value": "${p_data_day}",
"type": "string"
}
]
}
},
"writer": {
"name": "clickhousewriter",
"parameter": {
"username": "s",
"password": "s",
"column": [
"id",
"address",
"p_name",
"c_name",
"d_name",
"p_data_day"
],
"connection": [
{
"table": [
"poi"
],
"jdbcUrl": "jdbc:clickhouse://192.168.1.1:18123/test"
}
]
}
}
}
]
}
}
二、reader分区信息的配置
- DataX hdfsreader无法获取分区信息,我们可以通过动态参数指定分区信息,reader中分区信息的配置如下:
{
"value": "${p_data_day}",
"type": "string"
}
三、python 执行命令
python /app/soft/datax/bin/datax.py -p "-Dp_data_day=2020-06-20" /app/soft/datax/job/hive2clickhouse.json
- 注意:命令中的p_data_day分区字段要和reader中配置的value变量名称一致。
四、DataX Web中配置动态传参
- 机制:配置定时任务,任务执行时获取当前时间及用户选择的当前时间+—天数计算得到动态参数的值。
- 示例:
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。