java - Hadoop DistributedCache 已弃用 - 首选 API 是什么？

Hadoop DistributedCache 已弃用 - 首选 API 是什么？

我的地图任务需要一些配置数据，我想通过分布式缓存分发这些数据。

Hadoop MapReduce Tutorial 展示了DistributedCache类的用法，大致如下：

 // In the driver
JobConf conf = new JobConf(getConf(), WordCount.class);
...
DistributedCache.addCacheFile(new Path(filename).toUri(), conf);

// In the mapper
Path[] myCacheFiles = DistributedCache.getLocalCacheFiles(job);
...

但是， DistributedCache 在 Hadoop 2.2.0 中被标记为已弃用。

实现这一目标的新首选方法是什么？是否有涵盖此 API 的最新示例或教程？

原文由 DNA 发布，翻译遵循 CC BY-SA 4.0 许可协议

阅读 598

要扩展@jtravaglini，使用 DistributedCache 的首选方式 YARN/MapReduce 2 如下：

在您的驱动程序中，使用 Job.addCacheFile()

 public int run(String[] args) throws Exception {
    Configuration conf = getConf();

    Job job = Job.getInstance(conf, "MyJob");

    job.setMapperClass(MyMapper.class);

    // ...

    // Mind the # sign after the absolute file location.
    // You will be using the name after the # sign as your
    // file name in your Mapper/Reducer
    job.addCacheFile(new URI("/user/yourname/cache/some_file.json#some"));
    job.addCacheFile(new URI("/user/yourname/cache/other_file.json#other"));

    return job.waitForCompletion(true) ? 0 : 1;
}

在您的 Mapper/Reducer 中，覆盖 setup(Context context) 方法：

 @Override
protected void setup(
        Mapper<LongWritable, Text, Text, Text>.Context context)
        throws IOException, InterruptedException {
    if (context.getCacheFiles() != null
            && context.getCacheFiles().length > 0) {

        File some_file = new File("./some");
        File other_file = new File("./other");

        // Do things to these two files, like read them
        // or parse as JSON or whatever.
    }
    super.setup(context);
}

原文由 tolgap 发布，翻译遵循 CC BY-SA 3.0 许可协议

Hadoop DistributedCache 已弃用 - 首选 API 是什么？

你尚未登录，登录后可以

Java 开发 URL 匹配问题？

诺依框架自动生成代码前端Vue3提交数据，后端Java没收到问题出在哪里？

WSL里的Ubuntu系统开发Spring Boot报错Project build error: Non-readable POM ？

MyBatis Plus 如何对敏感字段加解密（使用哪种加密方式）？

请问是否有什么方案实现不同用户之间本地数据库的同步呢？

一个类实现接口并且继承父类使用Spring aop 失效?

idea 中有很多个 yml配置文件 , 如果想查找 a.b.c.d.e属性有什么好的办法吗?

Stack Overflow 翻译

Hadoop DistributedCache 已弃用 - 首选 API 是什么？

你尚未登录，登录后可以

Java 开发 URL 匹配问题？

诺依框架自动生成代码前端Vue3提交数据，后端Java没收到问题出在哪里？

WSL里的Ubuntu系统开发Spring Boot报错Project build error: Non-readable POM ？

MyBatis Plus 如何对敏感字段加解密（使用哪种加密方式）？

请问是否有什么方案实现不同用户之间本地数据库的同步呢？

一个类实现接口并且继承父类 使用Spring aop 失效?

idea 中 有很多个 yml配置文件 , 如果想查找 a.b.c.d.e属性 有什么好的办法吗?

Stack Overflow 翻译

一个类实现接口并且继承父类使用Spring aop 失效?

idea 中有很多个 yml配置文件 , 如果想查找 a.b.c.d.e属性有什么好的办法吗?