一、DataX Json配置(样例)


{
  "job": {
    "setting": {
      "speed": {
        "channel": 3,
        "byte": 1048576
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "hdfsreader",
          "parameter": {
                        "hadoopConfig": {
                            "dfs.nameservices": "nameservice1",
                            "dfs.ha.namenodes.nameservice1": "cdh201.qq.org,cdh202.qq.org",
                            "dfs.namenode.rpc-address.nameservice1.cdh201.qq.org": "cdh201.qq.org:8020",
                            "dfs.namenode.rpc-address.nameservice1.cdh202.qq.org": "cdh202.qq.org:8020",
                            "dfs.client.failover.proxy.provider.nameservice1": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
                        },
            "path": "/user/gsbdc/dbdatas/olsd/bns/gsods_rpt_qq/poi/p_data_day=2018-05-14/*",
                        "haveKerberos": "true",
                        "kerberosPrincipal": "bi@qq.ORG",
            "defaultFS": "hdfs://nameservice1",
                        "kerberosKeytabFilePath": "/app/soft/datax/job/bi.keytab",
            "fileType": "text",
            "fieldDelimiter": "\u0001",
            "column": [
              {
                "index": "0",
                "type": "string"
              },
              {
                "index": "1",
                "type": "string"
              },
              {
                "index": "2",
                "type": "string"
              },
              {
                "index": "3",
                "type": "string"
              },
              {
                "index": "4",
                "type": "string"
              },
              {
                "value": "${p_data_day}",
                "type": "string"
              }                  
            ]
          }
        },
        "writer": {
          "name": "clickhousewriter",
          "parameter": {
            "username": "s",
            "password": "s",
            "column": [
              "id",
              "address",
              "p_name",
              "c_name",
              "d_name",
              "p_data_day"
            ],
            "connection": [
              {
                "table": [
                  "poi"
                ],
                "jdbcUrl": "jdbc:clickhouse://192.168.1.1:18123/test"
              }
            ]
          }
        }
      }
    ]
  }
}

二、reader分区信息的配置

  • DataX hdfsreader无法获取分区信息,我们可以通过动态参数指定分区信息,reader中分区信息的配置如下:
{
  "value": "${p_data_day}",
   "type": "string"
}    

三、python 执行命令

python /app/soft/datax/bin/datax.py -p "-Dp_data_day=2020-06-20"  /app/soft/datax/job/hive2clickhouse.json
  • 注意:命令中的p_data_day分区字段要和reader中配置的value变量名称一致。

四、DataX Web中配置动态传参

  • 机制:配置定时任务,任务执行时获取当前时间及用户选择的当前时间+—天数计算得到动态参数的值。
  • 示例:


微野
65 声望90 粉丝

每当发现一条羊肠小路都好似救命稻草一般紧紧抓住,多年后我们回望那只不过是不断追逐的热点罢了。