cloudsolrclient commit 索引数量少于实际值

Question

cloudsolrclient commit 索引数量少于实际值

用户bPkPel

15921825

发布于
2018-12-15

更新于
2018-12-15

solrj创建索引，每次大概129776条数据，用多线程创建，每个线程最多处理1w条数据。每个线程核心代码如下。

可实际创建索引完成后只有97878条索引（每次执行，实际索引数量都不一样），看了日志没有报错。索引库主键是数据库的主键，不会重复。

                        String ip = PropertiesInit.getPropertiesValue("solrCluster.ip");
                        CloudSolrClient solrServer = new CloudSolrServer(ip);
                        // 第三步：需要设置DefaultCollection属性。
                        solrServer.setDefaultCollection(coreName);
                        // 第四步：创建一SolrInputDocument对象。
                        Collection<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
                        // 第五步：向文档对象中添加域
                        for (int i = 0; i < list.size(); i++) {
                            newSum++;
                            //一些逻辑代码
                            docs.add(document);
                        }
                        // 第六步：把文档对象写入索引库。
                        solrServer.add(docs);
                        // 第七步：提交。
                        UpdateResponse response = solrServer.commit();
                        System.out.println("本次提交索引数量: "+newSum+"。响应：" + response.getResponse());
                        solrServer.close();

打印出来的日志
图片描述

实际索引数量
图片描述

配置文件

    <autoCommit>
      <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

    <autoSoftCommit>
      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
    </autoSoftCommit>

solrj solrcloud solr

java

阅读 3.4k

1 个回答

得票最新

用户bPkPel

15921825

发布于
2018-12-25

✓ 已被采纳

找到原因了，确实是索引时有重复的主键。

通关观察统计选项，发现Deleted Docs的数量刚好是缺少的数量，问题定位到webservice接口返回的数据，最后发现是oracle分页时，sql语句中有按时间排序的条件，由于存在相同时间的数据，因此每页数据都不固定，导致第一页和第二页可能存在相同数据。

最后的解决办法是按id排序，避免分页数据重复。
图片描述