问题

最近一次上线后 线上突然多了好多ES查询超时异常

2020-02-01 11:40:43.851 ERROR 2852 --- [o-8022-exec-438] c.c.p.common.dao.SearchDao               : Search person by id failed

java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-895 [ACTIVE]
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:789) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:225) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1403) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1373) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.get(RestHighLevelClient.java:699) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-895 [ACTIVE]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) ~[httpcore-nio-4.4.11.jar!/:4.4.11]        
        

觉得很奇怪 因为就是简单的query by id 怎么也会超时呢?

 curl 'http://localhost:9201/person/_doc/301024026?pretty'

原因

当超时异常发生的时候 执行hot_threads查询 得知原来底层执行了refresh操作

curl 'http://localhost:9201/_nodes/hot_threads?pretty'

101.4% (507.1ms out of 500ms) cpu usage by thread 'elasticsearch[node-1][get][T#1]'
   8/10 snapshots sharing following 32 elements
     app//org.apache.lucene.index.SingletonSortedNumericDocValues.nextDoc(SingletonSortedNumericDocValues.java:53)
     app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.writeValues(Lucene80DocValuesConsumer.java:161)
     app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addNumericField(Lucene80DocValuesConsumer.java:111)
     app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:109)
     app//org.apache.lucene.index.ReadersAndUpdates.handleDVUpdates(ReadersAndUpdates.java:368)
     app//org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:570)
     app//org.apache.lucene.index.ReaderPool.writeAllDocValuesUpdates(ReaderPool.java:228)
     app//org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3308)
     app//org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:520)
     app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:294)
     app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:269)
     app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:259)
     app//org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)
     app//org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140)
     app//org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156)
     app//org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
     app//org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
     app//org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
     app//org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1548)
     app//org.elasticsearch.index.engine.InternalEngine.get(InternalEngine.java:652)
     app//org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:916)
     app//org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:169)
     app//org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:93)
     app//org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:84)
     app//org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:106)
     app//org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:45)

查看官网文档 得知如果该文档有更新 且尚未执行refresh 此时查询该文档会显式执行refresh操作

By default, the get API is realtime, and is not affected by the refresh rate of the index (when data will become visible for search). If a document has been updated but is not yet refreshed, the get API will issue a refresh call in-place to make the document visible. This will also make other documents changed since the last refresh visible. In order to disable realtime GET, one can set therealtimeparameter tofalse.
https://www.elastic.co/guide/...

这就能解释的通了 因为进入人物详情页面后 会对人物索引的viewCount加一 即文档有更新 默认30秒后才会执行refresh操作("refresh_interval" : "30s") 故后续查询该文档的时候 会显式执行refresh操作 导致get by id 超时 尤其是有大量文档更新的情况下

image.png

解决

禁止relatime即可

GetRequest getRequest = new GetRequest(personIndex, id.toString());  
getRequest.realtime(false);

参考文档

https://qbox.io/blog/refresh-...


zhuguowei2
825 声望26 粉丝