Mac下Eclipse提交任务到Hadoop集群

搭建Hadoop集群: VirtualBox+Ubuntu 14.04+Hadoop2.6.0

搭建好集群后, 在Mac下安装Eclipse并连接Hadoop集群

1. 访问集群

1.1. 修改Mac的hosts

添加Master的IP到Mac的hosts

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1    localhost
255.255.255.255    broadcasthost
::1             localhost

192.168.56.101  Master # 添加Master的IP

1.2 访问集群

Master下, 启动集群
Mac下, 打开http://master:50070/
能够成功访问, 看到集群的信息, 就可以了

2. 下载安装Eclipse

Eclipse IDE for Java Developers

http://www.eclipse.org/downloads/package...

3. 配置Eclipse

3.1 配置Hadoop-Eclipse-Plugin

3.1.1 下载Hadoop-Eclipse-Plugin

可下载 Github 上的 hadoop2x-eclipse-plugin（备用下载地址：http://pan.baidu.com/s/1i4ikIoP）

3.1.2 安装Hadoop-Eclipse-Plugin

在Applications中找个Eclise, 右键, Show Package Contents

图片描述

将插件复制到plugins目录下, 然后重新打开Eclipse就可以了

图片描述

3.2 连接Hadoop集群

3.2.1 配置Hadoop安装目录

将Hadoop安装包解压到任何目录, 不用做任何配置, 然后在Eclipse中指向该目录即可

图片描述

3.2.2 配置集群地址

点击右上角的加号

图片描述

添加Map/Reduce视图

图片描述

选择Map/Reduce Locations, 然后右键, 选择New Hadoop location

图片描述

需要改Location name, Host, DFS Master下的Port, User name ( Master会引用Mac中的hosts配置的IP ), 完成后, Finish

图片描述

3.2.3 查看HDFS

查看是否可以直接访问HDFS

图片描述

4. 集群中运行WordCount

4.1 创建项目

File -> New -> Other -> Map/Reduce Project

输入项目名: WordCount, 然后点击, Finish

4.2 创建类

创建一个类, 报名org.apache.hadoop.examples, 类名: WordCount

4.3 WordCount代码

复制下面的代码到WordCount.java中

package org.apache.hadoop.examples;
 
import java.io.IOException;
import java.util.StringTokenizer;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
 
public class WordCount {
 
  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
 
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
 
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
 
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
 
    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
 
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

4.4 配置Hadoop参数

将所有修改过的配置文件和log4j.properties, 复制到src目标下

这里我复制了slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml

4.4 配置HDFS输入输出路径

鼠标移动到WordCount.java上, 右键, Run As, Java Application

图片描述

此时, 程序不会正常运行. 再次右键, Run As, 选择Run Configurations

填入输入输出路径 (空格分割)

图片描述

配置完成后点击, Run. 此时会出现, Permission denied

5. 运行中出现的问题

5.1 Permission denied

没有权限访问HDFS

# 假设Mac的用户名为hadoop
groupadd supergroup # 添加supergroup组
useradd -g supergroup hadoop # 添加hadoop用户到supergroup组

# 修改hadoop集群中hdfs文件的组权限, 使属于supergroup组的所有用户都有读写权限
hadoop fs -chmod 777 /

6. 查看Hadoop源码

6.1 下载源码

http://apache.claz.org/hadoop/common/had...

6.2 链接源码

右上角的搜索框中, 搜索Open Type

图片描述

输入NameNode, 选择NameNode, 发现看不了源码

点击Attach Source -> External location -> External Floder

图片描述

参考资料

使用Eclipse编译运行MapReduce程序 Hadoop2.6.0_Ubuntu/CentOS

Mac下Eclipse提交任务到Hadoop集群

1. 访问集群

1.1. 修改Mac的hosts

1.2 访问集群

2. 下载安装Eclipse

3. 配置Eclipse

3.1 配置Hadoop-Eclipse-Plugin

3.1.1 下载Hadoop-Eclipse-Plugin

3.1.2 安装Hadoop-Eclipse-Plugin

3.2 连接Hadoop集群

3.2.1 配置Hadoop安装目录

3.2.2 配置集群地址

3.2.3 查看HDFS

4. 集群中运行WordCount

4.1 创建项目

4.2 创建类

4.3 WordCount代码

4.4 配置Hadoop参数

4.4 配置HDFS输入输出路径

5. 运行中出现的问题

5.1 Permission denied

6. 查看Hadoop源码

6.1 下载源码

6.2 链接源码

参考资料

熊一帆

引用和评论

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

【Hadoop】Yarn资源管理调度

【大数据内核解密】HDFS 架构与数据模型：从理论到实战全解析

【赵渝强老师】HBase的逻辑存储结构

【赵渝强老师】HBase的体系架构

【赵渝强老师】HBase的物理存储结构