第一章 Ganglia
参考:
http://www.moheqionglin.com/site/serialize/02006014001/detail.html
https://blog.51cto.com/hsbxxl/2062477
http://www.sunrisenan.com/docs/zabbix/zabbix-1b0fllosvls6a
https://www.cnblogs.com/cheyunhua/p/9262863.html
http://blog.itpub.net/30089851/viewspace-2126299/
http://blog.itpub.net/133735/viewspace-2138975/
1.1 Ganglia工作原理
Ganglia是由UC Berkeley发起的一个开源项目,主要通过收集各节点上的度量数据(如处理器速度、内存使用量等)实现系统性能的监控。Ganglia的核心包含gmetad、gmond以及Web前端三部分,这三部分之间通过XDL(xml的压缩格式)或者XML格式传递监控数据,达到监控效果。
Ganglia监控的大致过程为:集群内的节点通过运行gmond收集并相互发布节点状态信息,然后gmetad利用RRDTool工具周期性的轮询gmond收集到的信息,然后存入rrd数据库,最后再利用RRDTool工具将收集到的数据创建成图表,并通过web前端将其显示出来。
1.1.1 Ganglia的组件
(1) Gmetad程序
Gmetad用于轮询gmond节点存储的信息,并更新到rrd数据库中。一个数据源节点就是是一个gmond节点。一个gmetad节点可以设置多个数据源节点,每个数据源节点可以有多个备份,一个数据源节点失败了还可以从其他节点取数据。可以把Gmetad理解为服务器。
Gmetad只有TCP通道,一方面它向datasource发送请求,获取数据源节点的XML文件,另一方面会使用一个8651的默认TCP端口发布自身收集的XML文件,因此,Gmetad即可以从Gmond也可以从其他Gmetad节点获取XML数据。
(2) Gmond程序
Gmond收集:一般用于收集本机的监控数据,用gmond.conf的udp_rev_channel来配置
Gmond存储:并不是所有的gmond的都用来存储,可以找出其中的一台或者几台来存储即可,用gmond.conf的udp_send_channel来配置
Gmond节点之间的信息发送接收主要用udp协议,传递文件格式为XDL。
每个Gmond收集到的数据供Gmetad读取,Gmond通过默认端口8649监听到Gmetad请求后将XML格式数据发送给Gmetad。可以将Gmond理解为客户端。
Gmond收集数据有muticast和unicast两种。Gmond本身具有UDP的发送(send)和接受(recv)通道以及TCP的接收(recv)通道。其中UDP通道用于向其他Gmond节点发送或接收数据,TCP通道主要接受来自Gmetad的请求,向Gmetad发送XML文件。在muticast模式下,Gmond节点之间通过UDP向多播目标相互传递数据。
Gmond节点模块结构主要有三个模块组成:
1)collect and publish模块,该模块周期性的调用一些内部命令获得metric data,然后将这些数据通过UDP通道发布给其他Gmond节点。
2)Listen Threads监听其他Gmond节点发送的UDP数据,并将这些数据存放在内存中
3)XML Export Threads负责将数据以XML格式发布出去,比如交给Gmetad。
在unicast模式下,多个Gmond节点通过UDP向单播的目标主机host的Gmond发送数据,Gmetad然后向目标主机的Gmond请求XML文件。
在Unicast模式下,Gmond、Gmetad、rrd数据库以及web前端通常位于集群内的同一个节点上,该节点负责收集、存储、显示被监控的各节点的状态信息。
(3) Web前端
Web前端通常和Gmetad安装在同一个节点上,它从Gmetad中取数据,并且读取rrd数据库,生成图片显示出来。
Web前端通常和Gmetad安装在同一个节点上,它从Gmetad中取数据,并且读取rrd数据库,生成图片显示出来。
2.1 ganglia安装与使用
2.1.1 外网服务器安装
[epel]
baseurl=http://mirrors.yun-idc.com/ep...
failovermethod=priority
enabled=1
gpgcheck=1
gpgkey=http://mirrors.yun-idc.com/ep...
然后使用yum clean all 进行清理
[root@master ~]# yum install epel-release
[root@master ~]# yum install ganglia-web ganglia-gmetad ganglia-gmond
2.1.2 内网安装包安装
安装centos7时选择服务器模式,同时选择该环境下所有安装包
- 下载网站:
http://rpmfind.net/linux/rpm2html/search.php?query=ganglia-web&submit=Search+...&system=&arch=
- server1,server2安装gmond执行
rpm -ivh libconfuse-2.7-7.el7.x86_64.rpm
rpm -ivh ganglia-3.7.2-2.el7.x86_64.rpm
rpm -ivh ganglia-gmond-3.7.2-2.el7.x86_64.rpm
- master安装gmetad执行(安装顺序如下:)
rpm -ivhU t1lib-5.1.2-14.el7.x86_64.rpm
rpm -ivhU libmemcached-1.0.16-5.el7.x86_64.rpm
rpm -ivhU libzip-0.10.1-8.el7.x86_64.rpm
rpm -ivhU php-common-5.4.16-48.el7.x86_64.rpm
rpm -ivhU php-bcmath-5.4.16-48.el7.x86_64.rpm
rpm -ivhU php-cli-5.4.16-48.el7.x86_64.rpm
rpm -ivhU php-gd-5.4.16-48.el7.x86_64.rpm
rpm -ivhU php-process-5.4.16-48.el7.x86_64.rpm
rpm -ivhU php-xml-5.4.16-48.el7.x86_64.rpm
rpm -ivhU php-ZendFramework-1.12.20-1.el7.noarch.rpm
rpm -ivhU php-5.4.16-48.el7.x86_64.rpm
rpm -ivhU rrdtool-1.4.8-9.el7.x86_64.rpm
rpm -ivh libconfuse-2.7-7.el7.x86_64.rpm
rpm -ivh ganglia-3.7.2-2.el7.x86_64.rpm
rpm -ivh ganglia-gmond-3.7.2-2.el7.x86_64.rpm
rpm -ivhU ganglia-gmetad-3.7.2-2.el7.x86_64.rpm
rpm -ivhU ganglia-web-3.7.1-2.el7.x86_64.rpm
- 基本语法: rpm -ivh RPM包全路径名称
参数说明: i=install 安装 v=verbose 提示 h=hash 进度条
centos 7.x 单机版安装greenplum 6.0
- 卸载顺序
rpm -e ganglia-gmond-3.7.2-2.el7.x86_64
rpm -e ganglia-web-3.7.1-2.el7.x86_64
rpm -e ganglia-gmetad-3.7.2-2.el7.x86_64
rpm -e ganglia-3.7.2-2.el7.x86_64
rpm -e libconfuse-2.7-7.el7.x86_64
rpm -e libmemcached-1.0.16-5.el7.x86_64
rpm -e php-gd-5.4.16-48.el7.x86_64
rpm -e php-ZendFramework-1.12.20-1.el7.noarch
rpm -e php-process-5.4.16-48.el7.x86_64
rpm -e php-xml-5.4.16-48.el7.x86_64
rpm -e php-bcmath-5.4.16-48.el7.x86_64
rpm -e php-5.4.16-48.el7.x86_64
rpm -e php-cli-5.4.16-48.el7.x86_64
rpm -e php-common-5.4.16-48.el7.x86_64
rpm -e libzip-0.10.1-8.el7.x86_64
rpm -e rrdtool-1.4.8-9.el7.x86_64
rpm -e t1lib-5.1.2-14.el7.x86_64
2.1.3 配置监控端
[root@master ~]# vi /etc/ganglia/gmetad.conf
data_source "hadoop_hbase_cluster" 192.168.1.xx:8649 192.168.1.xx1:8649 192.168.1.xx2:8649
case_sensitive_hostnames 1
setuid_username "root"
gridname "MyGrid"
2.1.4 关联apache
因为Ganglia自创建的配置ganglia.conf有问题,所以先删除,再创建个软连接到Apache根目录下[root@master ~]# mv /etc/httpd/conf.d/ganglia.conf
/etc/httpd/conf.d/ganglia.conf.copy
[root@master ~]# ln -s /usr/share/ganglia /var/www/html/ganglia
2.1.5 启动Apache和Ganglia,并设置开机启动
[root@master ~]# chown -R root:root /var/lib/ganglia
[root@master ~]# service httpd start
Starting httpd: [ OK ]
[root@master ~]# service gmetad start
Starting GANGLIA gmetad: [ OK ]
2.1.6 安装与配置被监控端(每台同样配置)
# yum install ganglia-gmond
# vi /etc/ganglia/gmond.conf
globals {
daemonize = yes
setuid = yes
user = root ##root
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
# By default gmond will use reverse DNS resolution when displaying your hostname
# Uncommeting following value will override that value.
# override_hostname = "mywebserver.domain.com"
# If you are not using multicast this value should be set to something other than 0.
# Otherwise if you restart aggregator gmond you will get empty graphs. 60 seconds is reasonable
send_metadata_interval = 0 /*secs */
}
cluster{
name = "hadoop_hbase_cluster" #集群名,和上面那个一样 owner = "root" ##root
latlong = "unspecified"
url = "unspecified"
}
/* Thehost section describes attributes of the host, like the location */
host {
location = "unspecified"
}
/*Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel*/
udp_send_channel{
#bind_hostname = yes # Highly recommended,soon to be default.
# This option tells gmond to use asource address
# that resolves to themachine's hostname. Without
# this, the metrics mayappear to come from any
# interface and the DNSnames associated with
# those IPs will be usedto create the RRDs.
#mcast_join = 239.2.11.71 #关闭多播 host = 192.168.1.xx #添加发送IP/主机名 port = 8649 #默认端口 ttl = 1
}
/* Youcan specify as many udp_recv_channels as you like as well. */
udp_recv_channel{
#mcast_join = 239.2.11.71
port = 8649
bind = 192.168.1.xx #------------ 本机的ip/hostname 接收地址 retry_bind = true
# Size of the UDP buffer. If you are handlinglots of metrics you really
# should bump it up to e.g. 10MB or evenhigher.
# buffer = 10485760
}
……
- 同步
[root@master ~]# scp /etc/ganglia/gmond.conf root@server1:/etc/ganglia/gmond.conf
[root@master ~]# scp /etc/ganglia/gmond.conf root@server2:/etc/ganglia/gmond.conf
- 修改各个节点的gmond.conf的bind 为本节点ip
2.1.7 添加Hadoop被Ganglia监控(每台同样配置)
[root@master ~]# vi /usr/hdp/2.6.5.0-292/etc/hadoop/hadoop-metrics2.properties
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
namenode.sink.ganglia.servers=192.168.1.xx:8649 #当有多个ganglia监控系统时,以逗号分隔
datanode.sink.ganglia.servers=192.168.1.xx:8649 #都指定ganglia服务器
resourcemanager.sink.ganglia.servers=192.168.1.xx:8649
nodemanager.sink.ganglia.servers=192.168.1.xx:8649
- 同步
[root@master ~]# scp /usr/hdp/2.6.5.0-292/etc/hadoop/hadoop-metrics2.properties root@server1:/usr/hdp/2.6.5.0-292/etc/hadoop/hadoop-metrics2.properties
[root@master ~]# scp /usr/hdp/2.6.5.0-292/etc/hadoop/hadoop-metrics2.properties root@server2:/usr/hdp/2.6.5.0-292/etc/hadoop/hadoop-metrics2.properties
细节点:注意下面参数,如果不过来container的信息收集,可能会造成数据量过大,ganglia的磁盘空间迅速占满。
# Switch off container metrics
*.source.filter.class=org.apache.hadoop.metrics2.filter.GlobFilter
nodemanager.*.source.filter.exclude=*ContainerResource*
2.1.8 添加HBase被Ganglia监控,添加如下(每台同样配置)
https://wenda.chinahadoop.cn/question/3932
[root@master ~]# vi /usr/hdp/2.6.5.0-292/hbase/conf/hadoop-metrics2-hbase.properties
- 未添加过滤:
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
hbase.sink.ganglia.period=10
hbase.sink.ganglia.servers=192.168.1.xx:8649
- 添加过滤1:
*.source.filter.class=org.apache.hadoop.metrics2.filter.RegexFilter
*.record.filter.class=${*.source.filter.class}
*.metric.filter.class=${*.source.filter.class}
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
hbase.sink.ganglia.metric.filter.exclude=^(.*table.*)|(w+metric)|(w+regionw+)|(w+Assignw+)|(w+percentile)|(w+max)|(w+median)|(w+min)|(MetaHlogw+)$
hbase.sink.ganglia.period=10
hbase.sink.ganglia.servers=192.168.1.xx:8649
其中:
hbase.sink.ganglia.metric.filter.exclude=^(.*table.*)|([w+metric)|(w+regionw+)|(Balancerw+)|(w+Assignw+)|(w+percentile)|(w+max)|(w+median)|(w+min)|(MetaHlogw+)|(w+WALw+)$](file:///w+metric)|(w+regionw+)|(Balancerw+)|(w+Assignw+)|(w+percentile)|(w+max)|(w+median)|(w+min)|(MetaHlogw+)|(w+WALw+)$)
(1) master: 去除Assignment,Balancer, Filtersystem.MetaHlog ,以及各percentile、max、median、min,保留mean平均值
(2) regionserver:去除WAL相关,以及各percentile、max、median、min,保留mean平均值
(3) region:太多,表级别的,全部去除。
- 过滤2:过滤所有的region
*.period=10
*.sink.ganglia.period=10
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.source.filter.class=org.apache.hadoop.metrics2.filter.RegexFilter
*.record.filter.class=${*.source.filter.class}
*.metric.filter.class=${*.source.filter.class}
hbase.sink.ganglia.metric.filter.exclude=.*_(max|min|mean|median|percentile)
hbase.sink.ganglia.record.filter.exclude=Regions
hbase.sink.ganglia.source.filter.exclude=.*Regions.*
hbase.sink.ganglia.period=10
hbase.sink.ganglia.servers=192.168.1.xx:8649
- 同步
[root@master ~]# scp /usr/hdp/2.6.5.0-292/hbase/conf/hadoop-metrics2-hbase.properties root@server1:/usr/hdp/2.6.5.0-292/hbase/conf/hadoop-metrics2-hbase.properties
scp /usr/hdp/2.6.5.0-292/hbase/conf/hadoop-metrics2-hbase.properties root@server2:/usr/hdp/2.6.5.0-292/hbase/conf/hadoop-metrics2-hbase.properties
2.1.9 在服务端部署gweb
- 安装依赖包(需要用到httpd和php对gweb进行WEBUI支持)
[root@s101 yinzhengjie]# yum -y install httpd php
- 解压gweb到"/soft"目录
[root@master ~]$ wget http://jaist.dl.sourceforge.n...
[root@master ~]$ tar zxf ganglia-web-3.7.2.tar.gz -C /data/
- 修改编译文件(MakeFile)
[root@master ~]$ more /data/ganglia-web-3.7.2/Makefile | grep ^GDESTDIR
GDESTDIR = /var/www/html
[root@master ~]$ more /etc/httpd/conf/httpd.conf | grep ^DocumentRoot
DocumentRoot "/var/www/html"
[root@master ~]$
[root@master ~]$
[root@master ~]$ more /data/ganglia-web-3.7.2/Makefile | grep ^APACHE_USER
APACHE_USER = apache
[root@master ~]$
[root@master ~]$ more /etc/httpd/conf/httpd.conf | grep ^User
User apache
[root@master ~]$
- 开始编译
Dat
[root@master ganglia-web-3.7.2]# make install
rsync --exclude "rpmbuild" --exclude "*.gz" --exclude "Makefile" --exclude "*debian*" --exclude "ganglia-web-3.7.2" --exclude ".git*" --exclude "*.in" --exclude "*~" --exclude "#*#" --exclude "ganglia-web.spec" --exclude "apache.conf" -a . ganglia-web-3.7.2
mkdir -p /var/lib/ganglia-web/dwoo/compiled &&
mkdir -p /var/lib/ganglia-web/dwoo/cache &&
mkdir -p /var/lib/ganglia-web &&
rsync -a ganglia-web-3.7.2/conf //var/lib/ganglia-web &&
mkdir -p /var/www/html &&
rsync --exclude "conf" -a ganglia-web-3.7.2/* //var/www/html &&
chown -R apache:apache //var/lib/ganglia-web
[root@master ganglia-web-3.5.2]#
[root@s101 ganglia-web-3.5.2]# echo $?
0
- 在服务端开启服务
[root@centos01 ~]# /bin/systemctl restart httpd.service
[root@centos01 ~]# /bin/systemctl restart gmetad.service
[root@centos01 ~]# /bin/systemctl restart gmond.service
- 验证服务是否启动成功
- 通过WebUI访问Ganglia
[root@centos01 ~]# /bin/systemctl stop httpd.service
[root@centos01 ~]# /bin/systemctl stop gmetad.service
[root@centos01 ~]# /bin/systemctl stop gmond.service
- 输入:master.yunda.com/ganglia/即可到达页面
2.1.10 更换日志存储路径(无法成功,在此记录)
首先停止ganglia服务,更改相关配置文件并将ganglia收集的数据复制至你想放置的目录:
# cat /etc/ganglia/gmetad.conf
rrd_rootdir "/home/ganglia/rrds"
# cp rrd文件目录 /home/ganglia/rrds
# chown -R root:root /home/ganglia
重新启动相关服务,可见数据已经向/home/ganglia/rrds的目录下文件读入,但web界面没有任何数据图像,故更改/etc/ganglia/conf.php,
添加:
#cat /etc/ganglia/conf.php
$conf['gmetad_root'] = "/home/ganglia";
$conf['rrds'] = "${conf['gmetad_root']}/rrds";
为何这么写参考:/usr/share/ganglia/conf_default.php
重启相关服务即可(apache是否需要重启,不清楚)。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。