基于
spring-cloud-Greenwich.RELEASE
spring-boot-2.1.3.RELEASE
spring-boot-starter-actuator-2.1.3.RELEASE
Spring-cloud-netflix-eureka-client-2.1.0.RELEASE
背景
线上请求项目接口,spring-cloud-gateway
返回404
,排查发现是gateway
无法从eureka-server
获取到项目有效的注册信息。同时当时由于网络问题
,项目无法连上数据库。但是这次出现的网络问题
,仅仅是影响到项目与数据库的连接,并不影响项目与eureka-server
的连接。
通过日志,看到项目一直在对数据库做健康检测
,并且因为无法连上而一直有异常日志,同时看到了Eureka
下线通知的日志Saw local status change event StatusChangeEvent [timestamp=1598410016601, current=DOWN, previous=UP]
,而这两个日志都是在同一个线程
里打印的,线程名为DiscoveryClient-InstanceInfoReplicator-0
,既然是同一个线程,那说明是两者之间必然有关联。
那是什么原因导致eureka-server
没有项目的注册信息? 这个要从Eureka-Client
的健康检测
说起。
健康监测
按照惯例,要了解原理,就从阅读源码入手。Eureke-client
的初始化基本上都是在DiscoveryClient
类内完成的,包括启动健康监测
定时任务。
public class DiscoveryClient implements EurekaClient {
private void initScheduledTasks() {
……
if (clientConfig.shouldRegisterWithEureka()) {
……
// InstanceInfo replicator
instanceInfoReplicator = new InstanceInfoReplicator(
this,
instanceInfo,
clientConfig.getInstanceInfoReplicationIntervalSeconds(),
2); // burstSize
……
instanceInfoReplicator.start(clientConfig.getInitialInstanceInfoReplicationIntervalSeconds());
} else {
logger.info("Not registering with Eureka server per configuration");
}
}
}
在InstanceInfoReplicator
内定时检查系统健康
并刷新当前Eureka-client
节点状态。
class InstanceInfoReplicator implements Runnable {
public void run() {
try {
discoveryClient.refreshInstanceInfo();
……
} catch (Throwable t) {
logger.warn("There was a problem with the instance info replicator", t);
} finally {
Future next = scheduler.schedule(this, replicationIntervalSeconds, TimeUnit.SECONDS);
scheduledPeriodicRef.set(next);
}
}
}
public class DiscoveryClient implements EurekaClient {
void refreshInstanceInfo() {
……
InstanceStatus status;
try {
status = getHealthCheckHandler().getStatus(instanceInfo.getStatus());
} catch (Exception e) {
logger.warn("Exception from healthcheckHandler.getStatus, setting status to DOWN", e);
status = InstanceStatus.DOWN;
}
if (null != status) {
applicationInfoManager.setInstanceStatus(status);
}
}
}
这里通过HealthCheckHandler
获取instanceInfo
的status
并修改节点状态
和下发事件通知,如果获取到的status
是DOWN
,那这时候事件监听器就打印了我们在开头看到的日志,并且上报给Eureka-server
的节点状态也是DOWN
,最终导致这次问题的出现:网关无法从Eureka-server
获取到状态为UP
的节点。
public class ApplicationInfoManager {
public synchronized void setInstanceStatus(InstanceStatus status) {
InstanceStatus next = instanceStatusMapper.map(status);
if (next == null) {
return;
}
InstanceStatus prev = instanceInfo.setStatus(next);
if (prev != null) {
for (StatusChangeListener listener : listeners.values()) {
try {
listener.notify(new StatusChangeEvent(prev, next));
} catch (Exception e) {
logger.warn("failed to notify listener: {}", listener.getId(), e);
}
}
}
}
}
public class DiscoveryClient implements EurekaClient {
private void initScheduledTasks() {
……
if (clientConfig.shouldRegisterWithEureka()) {
……
statusChangeListener = new ApplicationInfoManager.StatusChangeListener() {
@Override
public String getId() {
return "statusChangeListener";
}
@Override
public void notify(StatusChangeEvent statusChangeEvent) {
if (InstanceStatus.DOWN == statusChangeEvent.getStatus() ||
InstanceStatus.DOWN == statusChangeEvent.getPreviousStatus()) {
// log at warn level if DOWN was involved
logger.warn("Saw local status change event {}", statusChangeEvent);
} else {
logger.info("Saw local status change event {}", statusChangeEvent);
}
instanceInfoReplicator.onDemandUpdate();
}
};
} else {
logger.info("Not registering with Eureka server per configuration");
}
}
}
这里的重点就是DiscoveryClient
的getHealthCheckHandler().getStatus(instanceInfo.getStatus())
是怎么获取到值的?getHealthCheckHandler
返回的是EurekaHealthCheckHandler
,继续跟进源码进入到EurekaHealthCheckHandler
类。
public class EurekaHealthCheckHandler implements HealthCheckHandler, ApplicationContextAware, InitializingBean {
private final CompositeHealthIndicator healthIndicator;
@Override
public void afterPropertiesSet() throws Exception {
final Map<String, HealthIndicator> healthIndicators = applicationContext.getBeansOfType(HealthIndicator.class);
for (Map.Entry<String, HealthIndicator> entry : healthIndicators.entrySet()) {
//ignore EurekaHealthIndicator and flatten the rest of the composite
//otherwise there is a never ending cycle of down. See gh-643
if (entry.getValue() instanceof DiscoveryCompositeHealthIndicator) {
DiscoveryCompositeHealthIndicator indicator = (DiscoveryCompositeHealthIndicator) entry.getValue();
for (DiscoveryCompositeHealthIndicator.Holder holder : indicator.getHealthIndicators()) {
if (!(holder.getDelegate() instanceof EurekaHealthIndicator)) {
healthIndicator.addHealthIndicator(holder.getDelegate().getName(), holder);
}
}
}
else {
healthIndicator.addHealthIndicator(entry.getKey(), entry.getValue());
}
}
}
}
在afterPropertiesSet
方法内通过applicationContext.getBeansOfType
获取到所有的健康检测类HealthIndicator
。
注:applicationContext.getBeansOfType
方法是通过遍历BeanDefinition
获取所有beanName
,然后遍历beanName
,判断如果当前beanName
未创建实例则会创建对应的Bean
对象实例。也就是说applicationContext.getBeansOfType
会确保将指定类型的所有的Bean
对象都创建好。
public class EurekaHealthCheckHandler implements HealthCheckHandler, ApplicationContextAware, InitializingBean {
public InstanceStatus getStatus(InstanceStatus instanceStatus) {
return getHealthStatus();
}
protected InstanceStatus getHealthStatus() {
final Status status = getHealthIndicator().health().getStatus();
return mapToInstanceStatus(status);
}
protected CompositeHealthIndicator getHealthIndicator() {
return healthIndicator;
}
}
调用CompositeHealthIndicator
的的health
方法获取状态,从前面的afterPropertiesSet
方法可以看到,CompositeHealthIndicator
是一个HealthIndicator
合集。
public class CompositeHealthIndicator implements HealthIndicator {
public void addHealthIndicator(String name, HealthIndicator indicator) {
this.registry.register(name, indicator);
}
@Override
public Health health() {
Map<String, Health> healths = new LinkedHashMap<>();
for (Map.Entry<String, HealthIndicator> entry : this.registry.getAll()
.entrySet()) {
healths.put(entry.getKey(), entry.getValue().health());
}
return this.aggregator.aggregate(healths);
}
}
public class OrderedHealthAggregator extends AbstractHealthAggregator {
public OrderedHealthAggregator() {
setStatusOrder(Status.DOWN, Status.OUT_OF_SERVICE, Status.UP, Status.UNKNOWN);
}
public void setStatusOrder(Status... statusOrder) {
String[] order = new String[statusOrder.length];
for (int i = 0; i < statusOrder.length; i++) {
order[i] = statusOrder[i].getCode();
}
setStatusOrder(Arrays.asList(order));
}
@Override
public final Health aggregate(Map<String, Health> healths) {
List<Status> statusCandidates = healths.values().stream().map(Health::getStatus)
.collect(Collectors.toList());
Status status = aggregateStatus(statusCandidates);
Map<String, Object> details = aggregateDetails(healths);
return new Health.Builder(status, details).build();
}
protected Status aggregateStatus(List<Status> candidates) {
// Only sort those status instances that we know about
List<Status> filteredCandidates = new ArrayList<>();
for (Status candidate : candidates) {
if (this.statusOrder.contains(candidate.getCode())) {
filteredCandidates.add(candidate);
}
}
// If no status is given return UNKNOWN
if (filteredCandidates.isEmpty()) {
return Status.UNKNOWN;
}
// Sort given Status instances by configured order
filteredCandidates.sort(new StatusComparator(this.statusOrder));
return filteredCandidates.get(0);
}
private class StatusComparator implements Comparator<Status> {
private final List<String> statusOrder;
StatusComparator(List<String> statusOrder) {
this.statusOrder = statusOrder;
}
@Override
public int compare(Status s1, Status s2) {
int i1 = this.statusOrder.indexOf(s1.getCode());
int i2 = this.statusOrder.indexOf(s2.getCode());
return (i1 < i2) ? -1 : (i1 != i2) ? 1 : s1.getCode().compareTo(s2.getCode());
}
}
}
CompositeHealthIndicator
的health
是遍历所有HealthIndicator
,调用HealthIndicator
的健康监测health
方法获取status
。再将status
根据DOWN->OUT_OF_SERVICE->UP->UNKNOWN
的顺序排序并获取第一个状态
(如果有节点状态为DOWN
,那获取的结果就是DOWN
)。
public class EurekaHealthCheckHandler implements HealthCheckHandler, ApplicationContextAware, InitializingBean {
private static final Map<Status, InstanceInfo.InstanceStatus> STATUS_MAPPING =
new HashMap<Status, InstanceInfo.InstanceStatus>() {{
put(Status.UNKNOWN, InstanceStatus.UNKNOWN);
put(Status.OUT_OF_SERVICE, InstanceStatus.OUT_OF_SERVICE);
put(Status.DOWN, InstanceStatus.DOWN);
put(Status.UP, InstanceStatus.UP);
}};
protected InstanceStatus mapToInstanceStatus(Status status) {
if (!STATUS_MAPPING.containsKey(status)) {
return InstanceStatus.UNKNOWN;
}
return STATUS_MAPPING.get(status);
}
}
最后将通用状态STATUS
映射成Eureka
的节点实例状态InstanceStatus
,并修改自身的状态。
总结
Eureka-client
定时通过所有的HealthIndicator
的health
方法获取对应的健康检查
状态,如果有HealthIndicator
检测结果为DOWN
,那Eureka-client
就会判定当前服务有问题,是不可用的
,就会将自身状态设置为DOWN
,并上报给Eureka-server
。Eureka-server
收到信息之后将该节点状态标识为DOWN
,这样其他服务就无法从Eureka-server
获取到该节点。
本次事故的原因就是因为DataSourceHealthIndicator
检查的结果是DOWN
,导致Eureka-client
的状态也变更为DOWN
。
扩展
- 如果项目有某个重要的功能,一旦这个功能出问题就希望能将当前节点下线,那就可以添加自定义
HealthIndicator
类,并在health
方法检查改功能是否正常。 - 可以通过
接口+HealthIndicator
实现控制服务上下线:
@RestController
@RequestMapping("/healthIndicator")
public class MyHealthIndicator implements HealthIndicator {
private boolean up = true;
@GetMapping("setUpVal/{up}")
public void setUpVal(@PathVariable("up") boolean up) {
this.up = up;
}
@Override
public Health health() {
if (up) {
return Health.up().build();
}
return Health.down().build();
}
public MyHealthIndicator setUp(boolean up) {
this.up = up;
return this;
}
}
以上可以通过调用接口/healthIndicator/setUpVal/false
来手动下线当前服务节点,同样的可以通过/healthIndicator/setUpVal/true
重新上线。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。