引言
在进行试题综合查询时,在和往常一样使用Predicate
拼接谓语时,遇到了棘手的问题。
需求是查询试题,除了其他的专业课、模型等通用条件外,需要查询没有被使用过的试题,以及当前试卷使用的试题。
试题列表中需要包含当前试卷可选择的所有试题,故如此设计。
初次实现
错误举例
这是最初的实现,构造了一个OR
条件,试题的subjectSpread
为空,或者其所在的subjectSpread
所属的试卷就是当前试卷。
return subjectRepository.findAll((Specification<Subject>) (root, query, builder) -> {
Predicate predicate = root.get("parent").isNull();
logger.debug("构造是否使用查询条件");
Predicate usedPredicate = root.get("subjectSpread").isNull();
logger.debug("根据试卷id构造查询条件");
if (paperId != null) {
Predicate belongPredicate = builder.equal(root.join("subjectSpread").join("part").join("paper").get("id").as(Long.class), paperId);
usedPredicate = builder.or(usedPredicate, belongPredicate);
}
logger.debug("连接谓语");
predicate = builder.and(predicate, usedPredicate);
return predicate;
}, pageable);
查询条件构造的逻辑看起来没问题,但经过测试,该接口只能查出来当前试卷的试题,无法查询出subjectSpread
为空的试题。
问题排查
项目中启用了show-sql
的选项,在控制台打印Hibernate
生成的SQL
语句。
spring:
jpa:
show-sql: true
Hibernate
自动生成的SQL
代码如下:
SELECT
subject0_.id AS id1_15_,
subject0_.analysis AS analysis2_15_,
subject0_.course_id AS course_i7_15_,
subject0_.create_time AS create_t3_15_,
subject0_.create_user_id AS create_u8_15_,
subject0_.difficult AS difficul4_15_,
subject0_.mark AS mark5_15_,
subject0_.model_id AS model_id9_15_,
subject0_.p_id AS p_id10_15_,
subject0_.stem AS stem6_15_,
subject0_.subject_spread_id AS subject11_15_
FROM SUBJECT subject0_
INNER JOIN subject_spread subjectspr1_ ON subject0_.subject_spread_id = subjectspr1_.id
INNER JOIN part part2_ ON subjectspr1_.part_id = part2_.id
INNER JOIN paper paper3_ ON part2_.paper_id = paper3_.id
INNER JOIN course course4_ ON subject0_.course_id = course4_.id
INNER JOIN model model5_ ON subject0_.model_id = model5_.id
WHERE (subject0_.p_id IS NULL)
AND (
subject0_.subject_spread_id IS NULL
OR paper3_.id = 2
)
AND course4_.id = 1
AND model5_.id = 2
ORDER BY subject0_.id DESC
问题就出现在这几行INNER JOIN
上:
SUBJECT subject0_
INNER JOIN subject_spread subjectspr1_ ON subject0_.subject_spread_id = subjectspr1_.id
INNER JOIN part part2_ ON subjectspr1_.part_id = part2_.id
INNER JOIN paper paper3_ ON part2_.paper_id = paper3_.id
INNER JOIN course course4_ ON subject0_.course_id = course4_.id
数据库连接
左连接、右连接、内连接区别,请看下图:
原因分析
再看如下SQL
:
subject INNER JOIN subject_spread ON subject.subject_spread_id = subject_spread.id
subject
与subject_spread
表进行内连接,条件subject.subject_spread_id = subject_spread.id
,所以subject_spread_id
为NULL
的记录就被连接排除,所以查不出来未使用的试题。
解决方案
1. 原生UNION
既然一次查不出来,就查两次,将两次的集合UNION
到一起。
可惜JPA
不支持UNION
,只能使用原生SQL
进行查询。
具体SQL
如下,在原SQL
基础上进行改动,再写一个查询未使用试题的语句,将两者的结果集进行UNION
,再Order
,再分页。
SELECT subject.*
FROM subject
INNER JOIN course ON subject.course_id = course.id
INNER JOIN model ON subject.model_id = model.id
WHERE subject.p_id IS NULL
AND subject.subject_spread_id IS NULL
AND course.id = ?
AND model.id = ?
UNION
SELECT subject.*
FROM subject
INNER JOIN subject_spread ON subject.subject_spread_id = subject_spread.id
INNER JOIN part ON subject_spread.part_id = part.id
INNER JOIN paper ON part.paper_id = paper.id
INNER JOIN course ON subject.course_id = course.id
INNER JOIN model ON subject.model_id = model.id
WHERE subject.p_id IS NULL
AND paper.id = ?
AND course.id = ?
AND model.id = ?
ORDER BY id DESC
LIMIT ?
2. 左连接(推荐)
请教潘老师后,发现其实并不需要这么麻烦,之前的查询错误是因为对JOIN
的理解不深刻,该应用场景下应该使用左连接方式,而非默认的内连接。
join
方法的第二个参数即是连接类型,之前没用过,一直使用默认的INNER
连接类型。
Predicate
查询修改为左连接:
Predicate belongPredicate = builder.equal(root.join("subjectSpread", JoinType.LEFT).join("part", JoinType.LEFT).join("paper", JoinType.LEFT).get("id").as(Long.class), paperId);
性能对比
两者都能实现功能,我们对比一下在大量数据的环境下各自查询的性能。
构造大量数据的方法
之前构造大量数据一直使用JPA
的saveAll
方法,觉得saveAll
一直是执行一条SQL
,比for
循环调用save
性能会有所提升。
直到上次与同学讨论时才推翻这个错误观点。
他向MySQL
中使用saveAll
插入一千条数据,耗费了大量时间,具体忘记了,好像是几十秒,最后使用MyBatis
拼SQL
去了。
经过研究后才发现,saveAll
还真就是for
循环,难怪这么慢。
以后大量数据的时候再也不用JPA
了,还不如自己写SQL
。
数据中初始化了几条测试数据:
写个存储过程,对数据进行翻倍,翻16
次,共计393,216
条数据。(幂真的是世界上最伟大的运算)
CREATE PROCEDURE BIG_DATA()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 16 DO
INSERT INTO subject(analysis, create_time, difficult, mark, stem, course_id, model_id, subject_spread_id)
SELECT analysis, create_time, difficult, mark, stem, course_id, model_id, subject_spread_id FROM subject;
SET i = i + 1;
END WHILE;
END
控制变量
因为Hibernate
生成SQL
会有一些性能损失,其与JDBCTemplate
执行的原生SQL
在性能上会有所差距,所以我们脱离Hibernate
,仅在数据库层面对比LEFT JOIN
与UNION
的性能。
LEFT JOIN
查询语句如下:
SELECT subject.*
FROM subject
LEFT JOIN subject_spread ON subject.subject_spread_id = subject_spread.id
LEFT JOIN part ON subject_spread.part_id = part.id
LEFT JOIN paper ON part.paper_id = paper.id
INNER JOIN course ON subject.course_id = course.id
INNER JOIN model ON subject.model_id = model.id
WHERE subject.p_id IS NULL
AND (
subject.subject_spread_id IS NULL
OR paper.id = 2
)
AND course.id = 1
AND model.id = 2
ORDER BY subject.id DESC
执行时间7.283
秒。
UNION
查询语句如下:
SELECT subject.*
FROM subject
INNER JOIN course ON subject.course_id = course.id
INNER JOIN model ON subject.model_id = model.id
WHERE subject.p_id IS NULL
AND subject.subject_spread_id IS NULL
AND course.id = 1
AND model.id = 2
UNION
SELECT subject.*
FROM subject
INNER JOIN subject_spread ON subject.subject_spread_id = subject_spread.id
INNER JOIN part ON subject_spread.part_id = part.id
INNER JOIN paper ON part.paper_id = paper.id
INNER JOIN course ON subject.course_id = course.id
INNER JOIN model ON subject.model_id = model.id
WHERE subject.p_id IS NULL
AND paper.id = 2
AND course.id = 1
AND model.id = 2
ORDER BY id DESC
执行时间16.450
秒。
性能对比
二者对比,UNION
花费的时间大约是LEFT JOIN
的两倍,数据库进行了两次条件查询。
总结
除非业务必要,否则,SQL
语句尽量不要采用UNION
等联合多语句查询结果的方式,多次查询意味着更多的时间花费。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。