在ambari管理页面看到hbase和ambari Metrics Collector 都没启动,
但实际上hbase肯定没有问题因为都在用着,而Metrics Collector可以启动但只能显示几秒钟然后就又显示stop状态了,但实际上Metrics Collector的进程一直是运行状态,数据也都一直在记录,看日志也没看出什么问题,
现在就只有一个可能性,就是在Metrics Collector的服务器上还有一份自己部署的hbase(不是ambari快速搭建的),有没有可能是这个hbase影响到ambari Metrics Collector,
问题是现在不知道去哪里查找问题了,集群还是生产环境还不能随便起停。。。
看ambari-server的log
05 Sep 2017 15:38:02,693 INFO [qtp-client-1076337] AmbariManagementControllerImpl:2025 - AmbariManagementControllerImpl.createHostAction: created ExecutionCommand for host hadoop30.com, role METRICS_COLLECTOR, roleCommand START, and command ID 205--1, with cluster-env tags version1
05 Sep 2017 15:38:02,708 INFO [ambari-action-scheduler] ServiceComponentHostImpl:949 - Host role transitioned to a new state, serviceComponentName=METRICS_COLLECTOR, hostName=hadoop30.com, oldState=INSTALLED, currentState=STARTING
05 Sep 2017 15:38:02,726 INFO [qtp-client-1076332] PersistKeyValueService:82 - Looking for keyName admin-settings-show-bg-admin
05 Sep 2017 15:38:12,697 INFO [qtp-ambari-agent-1076338] HeartBeatHandler:567 - Updating applied config on service AMBARI_METRICS, component METRICS_COLLECTOR, host hadoop30.com
05 Sep 2017 15:38:12,699 INFO [qtp-ambari-agent-1076338] ServiceComponentHostImpl:949 - Host role transitioned to a new state, serviceComponentName=METRICS_COLLECTOR, hostName=hadoop30.com, oldState=STARTING, currentState=STARTED
ambari-agent log:
INFO 2017-09-05 15:38:11,137 ClusterConfiguration.py:123 - Updating cached configurations for cluster xxxx
INFO 2017-09-05 15:38:11,155 ActionQueue.py:112 - Adding EXECUTION_COMMAND for role METRICS_COLLECTOR for service AMBARI_METRICS of cluster xxxx to the queue.
INFO 2017-09-05 15:38:11,188 ActionQueue.py:232 - Executing command with id = 205-0 for role = METRICS_COLLECTOR of cluster xxxx.
INFO 2017-09-05 15:38:11,222 Heartbeat.py:78 - Building Heartbeat: {responseId = 2629593, timestamp = 1504597091221, commandsInProgress = True, componentsMapped = True