前言
最近在学习使用elasticsearch查询并分页,并了解了以下三种分页方式:
from + size:
优点:支持随机翻页
缺点:深度分页问题,默认查询上限(from + size)是10000。
场景:百度、京东、谷歌、淘宝这样的随机翻页搜索。
after search:
优点:没有查询上限(单次查询的size不超过10000)
缺点:只能向后逐页查询,不支持随机翻页。
场景:没有随机翻页需求的搜索,例如手机向下翻滚翻页。
scroll(已经不推荐了):
优点:没有查询上限(单次查询的size不超过10000)
缺点:会有额外内存消耗,并且搜索结果是非实时的。
场景:海量数据的获取和迁移,从ES7.1开始不推荐,建议用after search方案。
于是我使用第1种方式使用java来实现,下面是我的操作步骤。
from + size操作
1、引入依赖包
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- https://mvnrepository.com/artifact/co.elastic.clients/elasticsearch-java -->
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>8.7.1</version>
</dependency>
2、连接配置
@Configuration
public class ElasticsearchClientConfig {
private final Logger log = LoggerFactory.getLogger(this.getClass());
@Autowired
ServiceConfig serviceConfig;
private int port;
private String host;
@Bean
public ElasticsearchClient esClient() {
try {
log.info("serviceConfig.getServerEnv()={}", serviceConfig.getServerEnv());
String url = "";
host = "xxx.xxx.xxx.xxx";
port = 9200;
RestClient restClient = RestClient.builder(new HttpHost(host, port)).build();
ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
return new ElasticsearchClient(transport);
} catch (Exception e) {
e.printStackTrace();
log.error("生成esClient失败" + e);
}
return null;
}
}
3、添加查询方法:
private SearchResponse<Map> getEsDataByPage(String fieldName, String fieldValue, int pageNum, int pageSize) {
try {
SearchResponse<Map> response = elasticsearchClient.search(s -> s
.index("url_index")
.query(q -> q.match(t -> t
.field(fieldName)
.query(fieldValue)
)
).from((pageNum - 1) * pageSize)
.size(pageSize),
Map.class
);
TotalHits total = response.hits().total();
log.info("total={}", total);
} catch (Exception ex) {
log.info("error={}", ex.getMessage());
return null;
}
}
4、使用查询方法
public PageResults<UrlIndexDto> searchDataBy(String fieldName, String fieldValue, int pageNum, int pageSize) {
// 获取es数据
SearchResponse<Map> response = getEsDataByPage(fieldName, fieldValue, pageNum, pageSize);
if (response != null) {
TotalHits total = response.hits().total();
boolean isExactResult = total.relation() == TotalHitsRelation.Eq;
if (isExactResult) {
log.info("There are " + total.value() + " results");
if (total.value() > 0) {
List<Hit<Map>> hits = response.hits().hits();
List<UrlIndexDto> urlIndexDtos = new ArrayList<>();
List<String> urlIds = new ArrayList<>();
ObjectMapper objectMapper = new ObjectMapper();
for (Hit<Map> hit : hits) {
Map source = hit.source();
UrlIndexDto urlIndexDto = objectMapper.convertValue(source, UrlIndexDto.class);
if (urlIndexDto!=null&& StringUtils.hasLength(urlIndexDto.getId())){
urlIds.add(urlIndexDto.getId());
}
urlIndexDtos.add(urlIndexDto);
}
// 查询mongodb中数据
List<UrlAnalysisDto> urlAnalysisData = getUrlAnalysisData(urlIds);
// 转换成<id, Object>形式,方便下面get查询
Map<String, UrlIndexDto> urlIndexDtosMap = urlIndexDtos.stream().collect(Collectors.toMap(UrlIndexDto::getId, item -> item));
urlAnalysisData.forEach(item -> {
// 根据id查询
UrlIndexDto urlIndexDto = urlIndexDtosMap.get(item.getId());
if (urlIndexDto != null) {
// 因为对象是通过引用传递的,在这里设置值,其实最后还是更新到了urlIndexDtos中
urlIndexDto.setClickUrlCount(item.getUserClickUrlCount());
urlIndexDto.setCopiedUrlCount(item.getUserCopyUrlCount());
}
});
PageResults<UrlIndexDto> pageResults = new PageResults<>();
pageResults.setRows(urlIndexDtos);
pageResults.setTotal(total.value());
pageResults.setPageNum(pageNum);
pageResults.setPageSize(pageSize);
return pageResults;
}
}
}
PageResults<UrlIndexDto> pageResults = new PageResults<>();
pageResults.setRows(List.of());
pageResults.setTotal(0L);
pageResults.setPageNum(pageNum);
pageResults.setPageSize(pageSize);
return pageResults;
}
这样就完成了elasticsearch form+size的查询
高亮操作
通常像谷歌、百度那样的搜索引擎输入完搜索之后,会把关键字高亮,如下所示:
这个我们先通过devtools查询
GET /dev_index_urls/_search
{
"query": {
"multi_match": {
"query": "ElasticSearch整合开发之",
"fields": ["title", "description"],
"fuzziness": "AUTO"
}
},
"highlight": {
"fields": {
"title": {},
"description": {}
},
"pre_tags": ["<strong>"],
"post_tags": ["</strong>"]
}
}
查询结果:
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7260926,
"hits": [
{
"_index": "dev_index_urls",
"_id": "64d0f20e5249c123ad9fe97d",
"_score": 1.7260926,
"_source": {
"_class": "com.seaurl.searchservice.document.UrlIndex",
"id": "64d0f20e5249c123ad9fe97d",
"uid": "6461e6ac25d966329b7d7642",
"categoryId": "64b899a0af50e77e53ef89eb",
"categoryName": "语文",
"categoryParentId": "",
"categoryParentName": "",
"title": "ElasticSearch整合开发之 ElasticSearchOptions 客户端操作_elasticsearchoperations_Leon_Jinhai_Sun的博客-CSDN博客",
"suggest": {
"input": [
"ElasticSearch整合开发之 ElasticSearchOptions 客户端操作_elasticsearchoperations_Leon_Jinhai_Sun的博客-CSDN博客"
]
},
"url": "https://blog.csdn.net/Leon_Jinhai_Sun/article/details/126796380",
"domain": "blog.csdn.net",
"description": "ElasticSearch整合开发之 ElasticSearchOptions 客户端操作_elasticsearchoperations",
"favicon": "https://cdn.seaurl.com/space/url/blog.csdn.net/favicon.ico",
"createdDt": 1691415053920,
"updatedDt": 1691415053920
},
"highlight": {
"description": [
"<strong>ElasticSearch</strong><strong>整</strong><strong>合</strong><strong>开</strong><strong>发</strong><strong>之</strong> ElasticSearchOptions 客户端操作_elasticsearchoperations"
],
"title": [
"<strong>ElasticSearch</strong><strong>整</strong><strong>合</strong><strong>开</strong><strong>发</strong><strong>之</strong> ElasticSearchOptions 客户端操作_elasticsearchoperations_Leon_Jinhai_Sun的博客-CSDN博客"
]
}
}
]
}
}
可以看到多了个highlight字段,下面我们将这个命令使用java来实现,修改上面的查询代码,如下所示:
SearchResponse<Map> response = elasticsearchClient.search(s -> s
.index(index)
.query(q -> q.multiMatch(t -> t.query(fieldValue).fields(filedNames)))
.highlight(h -> h
.preTags("<span style='color: red'>")
.postTags("</span>")
.fields(highlightFieldMap)
)
.from((pageNum - 1) * pageSize)
.size(pageSize),
Map.class);
当然如果你想加入筛选查询,修改成下面这样就可以了。
SearchResponse<Map> response = elasticsearchClient.search(s -> s
.index(index)
.query(q -> q
.bool(b -> b
.must(m -> m
.multiMatch(t -> t
.query(fieldValue)
.fields(filedNames)
)
)
.filter(f -> f
.term(t -> t
.field("env")
.value(serviceConfig.getServerEnv())
)
)
)
)
.highlight(h -> h
.preTags("<span style='color: red'>")
.postTags("</span>")
.fields(highlightFieldMap)
)
.from((pageNum - 1) * pageSize)
.size(pageSize),
Map.class);
这样就完成了高亮操作。
总结
1、分析了三种查询的优缺点之后,尝试了使用after search分页,但是一直不成功,再加上scroll查询已经不推荐使用了,所以才选择使用from+size分页查询,后面有时间我再把after search 分页查询加上
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。