【写在前面】飞腾开发者平台是基于飞腾自身强大的技术基础和开放能力,聚合行业内优秀资源而打造的。该平台覆盖了操作系统、算法、数据库、安全、平台工具、虚拟化、存储、网络、固件等多个前沿技术领域,包含了应用使能套件、软件仓库、软件支持、软件适配认证四大板块,旨在共享尖端技术,为开发者提供一个涵盖多领域的开发平台和工具套件。点击这里开始你的技术升级之旅吧

image.png

本文分享至飞腾开发者平台《飞腾平台Flume1.8移植与安装手册》

1 介绍

  Flume是由cloudera软件公司产出的可分布式日志收集系统,后与2009年被捐赠了apache软件基金会,为hadoop相关组件之一。

  Flume是一种分布式,可靠且可用的服务,用于高效地收集,汇总和移动大量日志数据。它具有基于流式数据流的简单而灵活的架构。它具有可靠的可靠性机制以及许多故障转移和恢复机制,具有强大的容错性和容错能力。它使用一个简单的可扩展数据模型,允许在线分析应用程序。

  本文主要介绍移植适配后的Flume1.8在飞腾平台的安装与部署过程。

2 环境要求

2.1 硬件要求

  硬件要求如下表所示。

项目说明
CPUFT-2000+/64服务器
网络无要求
存储无要求
内存无要求

2.2 操作系统要求

  操作系统要求如下表所示。

项目说明
CentOS8
Kernel4.18.0-193.el8.aarch64

2.3 软件要求

  软件要求如下表所示。

项目说明
Java1.8.0_281
Hadoop3.3.0

3 安装与部署

3.1 程序部署

  下载apache-sqoop

wget
http://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
mv /apache-flume-1.8.0-bin.tar.gz /opt
cd /opt/
tar -zxvf apache-flume-1.8.0-bin
mv apache-flume-1.8.0-bin flume-1.8.0

3.2 程序配置

  1)配置环境变量

  编辑 /etc/profile 文件,添加以下内容:

export FLUME_HOME=/opt/flume-1.8
export PATH=$PATH:$FLUME_HOME/bin
export FLUME_CONF_DIR=\$FLUME_HOME/conf

  2)配置启动信息

# vim /opt/flume-1.8/conf/flume-env.sh
#日志配置
export JAVA_HOME=/opt/jdk-11.0.11
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
export JAVA_OPTS="-Xms2000m -Xmx5000m -Dcom.sun.management.jmxremote"

  3)进入$ FLUME_HOME目录,并新建 conf/ ile-to-hdfs.conf文件添加以下配置

添加配置文件(读取指定文件写入HDFS中)
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /tmp/test.log

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path =hdfs://master.hadoop:9000/flume/%y-%m-%d/%H-%M
# 保存到HDFS上的前缀
a1.sinks.k1.hdfs.filePrefix = weichat_log
a1.sinks.k1.hdfs.fileSuffix = .dat
a1.sinks.k1.hdfs.batchSize= 100
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat =Text

# 配置存储在HDFS上的文件大小单位(bytes)
a1.sinks.k1.hdfs.rollSize = 262144
# 写入多少个event数据后滚动文件(事件个数)
a1.sinks.k1.hdfs.rollCount = 10
# 文件滚动之前的等待时间(秒)
a1.sinks.k1.hdfs.rollInterval = 120

# 1分钟就改目录(创建目录)
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.useLocalTimeStamp = true

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.3 启动服务

  1)创建测试目录

[hadoop@master flume-1.8]$ hadoop fs -mkdir /flume

  2)启动服务

[hadoop@master flume-1.8]$ $FLUME_HOME/bin/flume-ng agent -c conf -f
$FLUME_HOME/conf/file-to-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /opt/flume-1.8/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/hadoop-3.3.0/bin/hadoop) for HDFS access

Info: Including Hive libraries found via (/opt/hive-3.1.2) for Hive access
+ exec /opt/jdk-11.0.11/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/opt/flume-1.8/conf:/opt/flume-1.8/lib/\*:/opt/hadoop-3.3.0/etc/hadoop:/opt/hadoop-3.3.0/share/hadoop/common/lib/\*:/opt/hadoop-3.3.0/share/hadoop/common/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop-3.3.0/share/hadoop/hdfs/lib/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs/\*:/opt/hadoop-3.3.0/share/hadoop/mapreduce/\*:/opt/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop-3.3.0/share/hadoop/yarn/lib/\*:/opt/hadoop 3.3.0/share/hadoop/yarn/\*:/opt/hive-3.1.2/lib/\*' -Djava.library.path=:/opt/hadoop-3.3.0/lib/native org.apache.flume.node.Application -f /opt/flume-1.8/conf/file-to-hdfs.conf -n a1

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.zip.ZipFile\$Source.initCEN(ZipFile.java:1502)
        at java.base/java.util.zip.ZipFile\$Source.\<init\>(ZipFile.java:1280)
        at java.base/java.util.zip.ZipFile\$Source.get(ZipFile.java:1243)
        at java.base/java.util.zip.ZipFile\$CleanableResource.\<init\>(ZipFile.java:732)
        at java.base/java.util.zip.ZipFile\$CleanableResource.get(ZipFile.java:841)
        at java.base/java.util.zip.ZipFile.\<init\>(ZipFile.java:247)
        at java.base/java.util.zip.ZipFile.\<init\>(ZipFile.java:177)
        at java.base/java.util.jar.JarFile.\<init\>(JarFile.java:348)
        at java.base/jdk.internal.loader.URLClassPath\$JarLoader.getJarFile(URLClassPath.java:815)
        at java.base/jdk.internal.loader.URLClassPath\$JarLoader\$1.run(URLClassPath.java:760)
        at java.base/jdk.internal.loader.URLClassPath\$JarLoader\$1.run(URLClassPath.java:753)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/jdk.internal.loader.URLClassPath\$JarLoader.ensureOpen(URLClassPath.java:752)
        at java.base/jdk.internal.loader.URLClassPath\$JarLoader.\<init\>(URLClassPath.java:727)
        at java.base/jdk.internal.loader.URLClassPath\$3.run(URLClassPath.java:493)
        at java.base/jdk.internal.loader.URLClassPath\$3.run(URLClassPath.java:476)
        at java.base/java.security.AccessController.doPrivileged(Native Method)

  原因:是flume默认jvm堆栈只有20m,导致flume服务无法启动。

  修改:flume启动脚本flume-ng中,修改JAVA_OPTS="-Xmx20m"为JAVA_OPTS="-Xmx2048m"。

  3)再次启动Flume测试

[hadoop@master conf]$ $FLUME_HOME/bin/flume-ng agent -c conf -f
$FLUME_HOME/conf/file-to-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/opt/hadoop-3.3.0/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/opt/hive-3.1.2) for Hive access
+ exec /opt/jdk-11.0.11/bin/java -Xmx2000m -Dflume.root.logger=INFO,console -cp
'conf:/opt/flume-1.8/lib/\*:/opt/hadoop-3.3.0/etc/hadoop:/opt/hadoop-3.3.0/share/hadoop/common/lib/\*:/opt/hadoop-3.3.0/share/hadoop/common/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop-3.3.0/share/hadoop/hdfs/lib/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs/\*:/opt/hadoop-3.3.0/share/hadoop/mapreduce/\*:/opt/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop-3.3.0/share/hadoop/yarn/lib/\*:/opt/hadoop-3.3.0/share/hadoop/yarn/\*:/opt/hive-3.1.2/lib/\*'-Djava.library.path=:/opt/hadoop-3.3.0/lib/native org.apache.flume.node.Application -f /opt/flume-1.8/conf/file-to-hdfs.conf -n a1
2021-08-13 16:34:01,202 ERROR hdfs.HDFSEventSink: process failed

java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
        at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
        at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:226)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner\$PollingRunner.run(SinkRunner.java:145)
        at java.base/java.lang.Thread.run(Thread.java:834) 
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
        at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
        at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:226)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner\$PollingRunner.run(SinkRunner.java:145)
        at java.base/java.lang.Thread.run(Thread.java:834)./server/lib/guava-11.0.2.jar
[root@master sqoop-1.99.7-bin-hadoop200]#
[root@master sqoop-1.99.7-bin-hadoop200]

  以上报错,经过分析后发现是guava版本问题,flume内部依赖guava版本与hadoop依赖的guava版本存在冲突与兼容性问题

  解决方案:

hadoop\@master flume-1.8]\$ cp
/opt/hadoop-3.3.0/share/hadoop/common/lib/guava-27.0-jre.jar
./lib/guava-11.0.2.jar

  再次启动成功

[hadoop@master ~]$ ps -elf|grep flume
0 S hadoop  2593281    1 0 80  0 -  64 do_wai Aug13 ?    00:00:00 
/bin/sh ./start_flume.sh
0 S hadoop  2593282 2593281 0 80  0 - 226106 futex\_ Aug13 ?    00:01:29
/opt/jdk-11.0.11/bin/java -Xms2000m -Xmx5000m -Dcom.sun.management.jmxremote -Dflume.root.logger=INFO,console -cp
/opt/flume-1.8/conf:/opt/flume-1.8/lib/\*:/opt/hadoop-3.3.0/etc/hadoop:/opt/hadoop-3.3.0/share/hadoop/common/lib/\*:/opt/hadoop-3.3.0/share/hadoop/common/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop-3.3.0/share/hadoop/hdfs/lib/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs/\*:/opt/hadoop-3.3.0/share/hadoop/mapreduce/\*:/opt/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop-3.3.0/share/hadoop/yarn/lib/\*:/opt/hadoop-3.3.0/share/hadoop/yarn/\*:/opt/hive-3.1.2/lib/\* -Djava.library.path=:/opt/hadoop-3.3.0/lib/native
org.apache.flume.node.Application -f /opt/flume-1.8/conf/file-to-hdfs.conf -n a1
0 S hadoop  2666015 2665971 0 80  0 -  58 pipe_w 17:13 pts/0  00:00:00 grep --color=auto flume

4 功能测试

4.1 启动客户端测试

  1)写入测试数据到监听文件

[hadoop@master ~]$ echo 'hello'>>/tmp/test.log

  在写入文件过程中,flume会在hdfs上生成临时文件:

[hadoop@master ~]$ hadoop fs -ls /flume/21-08-13/16-41/
Found 1 items
-rw-r--r--  1 hadoop supergroup     30 2021-08-13 16:41
/flume/21-08-13/16-41/weichat_log.1628844076756.dat.tmp

  达到一定的文件大小以后,会把hdfs上的临时文件自动命名会.dat文件结尾的数据文件

[hadoop\@master \~]\$ hadoop fs -ls
/flume/21-08-13/16-39/weichat_log.1628843971001.dat
-rw-r--r--  1 hadoop supergroup     21 2021-08-13 16:40
/flume/21-08-13/16-39/weichat_log.1628843971001.dat
[hadoop@master ~]$ hadoop fs -cat
/flume/21-08-13/16-39/weichat_log.1628843971001.dat
hello
hello
hello

  结果表明:大数据组件Flume1.8 在飞腾平台下运行结果正确,符合预期,功能正常。


推荐阅读

欢迎广大开发者来飞腾开发者平台获取更多前沿技术文档及资料

如开发者在使用飞腾产品有任何问题可通过在线工单联系我们



版权所有。飞腾信息技术有限公司 2023。保留所有权利。

未经本公司同意,任何单位、公司或个人不得擅自复制,翻译,摘抄本文档内容的部分或全部,不得以任何方式或途径进行传播和宣传。

商标声明

Phytium和其他飞腾商标均为飞腾信息技术有限公司的商标。

本文档提及的其他所有商标或注册商标,由各自的所有人拥有。

注意

本文档的内容视为飞腾的保密信息,您应当严格遵守保密任务;未经飞腾事先书面同意,您不得向任何第三方披露本文档内容或提供给任何第三方使用。

由于产品版本升级或其他原因,本文档内容会不定期进行更新。除非另有约定,本文档仅作为使用指导,飞腾在现有技术的基础上尽最大努力提供相应的介绍及操作指引,但飞腾在此明确声明对本文档内容的准确性、完整性、适用性、可靠性的等不作任何明示或暗示的保证。

本文档中所有内容,包括但不限于图片、架构设计、页面布局、文字描述,均由飞腾和/或其关联公司依法拥有其知识产权,包括但不限于商标权、专利权、著作权等。非经飞腾和/或其关联公司书面同意,任何人不得擅自使用、修改,复制上述内容。


飞腾开发者
6 声望3 粉丝

飞腾开发者技术小助手,定期分享飞腾技术文档,助力开发者打怪升级。更多材料获取:[链接]