SQL complex queries refer to subqueries.
Why are subqueries called complex queries? Because subqueries are equivalent to query nested queries, the complexity can be amplified almost infinitely (infinite nesting) because of nesting, so it is called complex query. Here is an example of the simplest subquery:
SELECT pv FROM (
SELECT pv FROM test
)
The above example is equivalent to SELECT pv FROM test
, but because the table location is replaced with a new query, it turns into a complex query! So complex queries are not necessarily really complex, and it is even possible to write complex queries that are equivalent to ordinary queries. This kind of meaningless behavior should be avoided.
Let's also take this opportunity to understand why subqueries work this way.
understand the nature of the query
When we look up a table, what does the database think we're looking up?
This is important because both of the following statements are legal:
SELECT pv FROM test
SELECT pv FROM (
SELECT pv FROM test
)
Why can databases treat subqueries as tables? In order to understand these concepts uniformly, it is necessary to abstractly understand the query content: query location is one or more records .
For example, the table test
obviously has multiple records (of course, only one row is one record), and SELECT pv FROM test
also has multiple records. However, because FROM
can query any number of records, both syntaxes are supported.
Not only can FROM
be followed by a single or multiple records, but even SELECT
, GROUP BY
, WHERE
, HAVING
can be followed by multiple records, which will be discussed later.
Speaking of which, it's easy to understand the variants of subqueries. For example, we can use WHERE
or GROUP BY
, etc. in the subquery, because in any case, as long as the query result is multiple records:
SELECT sum(people) as allPeople, sum(gdp), city FROM (
SELECT people, gdp, city FROM test
GROUP BY city
HAVING sum(gdp) > 10000
)
This example has some business implications. The subquery is executed from the inside out, so let's look at the internal logic first: group by city, and filter out the population details of all regions with a total GDP of more than 10,000. The outer query adds up the population, so that you can compare the total population and total GDP of each area with a GDP of more than 10,000, which is convenient for comparing these key cities.
However, this example still looks unnatural, because we don't need to write complex queries, in fact, simple queries are equivalent:
SELECT sum(people) as allPeople, sum(gdp), city FROM test
GROUP BY city
HAVING sum(gdp) > 10000
So why go overboard? Because the real usage of complex queries is not here.
view
It is because of the existence of subqueries that we can extract subqueries in a similar way to extracting variables. The abstraction extracted is the view:
CREATE VIEW my_table(people, gdp, city)
AS
SELECT sum(people) as allPeople, sum(gdp), city FROM test
GROUP BY city
HAVING sum(gdp) > 10000
SELECT sum(people) as allPeople, sum(gdp), city FROM my_table
The advantage of this is that this view can be reused by multiple SQL statements, which not only improves the maintainability, but also only needs to be queried once during execution.
It should be noted that SELECT can use any view, but when INSERT, DELETE, and UPDATE are used for a view, the view needs to meet the following conditions:
- DISTINCT is not used for deduplication.
- FROM single table.
- GROUP BY and HAVING are not used.
Because the above modes will cause the view to become the aggregated data, it is inconvenient to do operations other than query.
Another point of knowledge is the materialized view, that is, the use of MATERIALIZED to describe the view:
CREATE MATERIALIZED VIEW my_table(people, gdp, city)
AS ...
This view will be dropped, why support this feature? Because ordinary views, as temporary tables, cannot use optimization methods such as indexes, and the query performance is low, materialized views are more common performance optimization methods.
When it comes to performance optimization methods, there are some more common concepts, that is, to amortize the complexity of reading to the time of writing, such as pre-aggregating new tables to disk or solidifying CASE statements into fields, etc., which will not be expanded here.
Scalar subquery
As mentioned above, WHERE can also be followed by subqueries, such as:
SELECT city FROM test
WHERE gdp > (
SELECT avg(gdp) from test
)
In this way, you can query the cities whose gdp is greater than the average.
So why can't you just write it like this?
SELECT city FROM test
WHERE gdp > avg(gdp) -- 报错,WHERE 无法使用聚合函数
It looks beautiful, but in fact, as we introduced in the first article, WHERE cannot be used with aggregate queries, because this will aggregate the entire parent query. So why subqueries work? Because the subquery aggregates the subquery, the parent query is not aggregated, so this is in line with our intention.
So the inappropriate part of the above example is that using avg(gdp)
directly in the current query will lead to aggregation, and we do not want to aggregate the current query, but we want to get the average GDP through aggregation, so we need to use subqueries!
Looking back, why is this section called a scalar subquery? A scalar is a single value, because there is only one value aggregated by avg(gdp)
, so WHERE can use it as a single value. Conversely, if the subquery does not use aggregate functions, or GROUP BY grouping, then you cannot use the WHERE >
syntax, but you can use WHERE IN
, which involves the thinking of single and multiple records, we will continue to see the next section.
Single and multiple records
When introducing the scalar subquery, it was mentioned that the value of WHERE >
must be a single value. But in fact, WHERE can also be followed by sub-query results that return multiple records, as long as a reasonable conditional statement is used, such as IN:
SELECT area FROM test
WHERE gdp IN (
SELECT max(gdp) from test
GROUP BY city
)
In the above example, the sub query is grouped by city and finds the record with the largest GDP in each group, so if the data granularity is region, then we find the records with the largest GDP in each city, and then the parent query finds it through WHERE IN The complex result of the gdp match, so at the end, the area with the largest gdp in each city is listed.
But in fact WHERE >
statement and the plural query results will not report errors, but they have no meaning, so we need to understand whether the query results are single or multiple, and choose the appropriate conditions when judging WHERE. WHERE syntax suitable for complex query results are: WHERE IN
, WHERE SOME
, WHERE ANY
.
correlated subqueries
The so-called correlated subquery means that there is an association between the parent and child queries. In this case, the subquery must not be executed in priority alone. After all, it is associated with the parent query, so the correlated subquery executes the outer query first, and then executes the inner query. It should be noted that for each row of the parent query, the subquery will be executed once, so the performance is not high (of course, SQL will cache the results of the subquery with the same parameters).
So what is this connection? The association is the condition that is executed on the subquery when each row of the parent query is executed. This may be a bit convoluted, for example:
SELECT * FROM test where gdp > (
select avg(gdp) from test
group by city
)
For this example, you want to find that the gdp is greater than the average gdp grouped by city, for example, the Beijing area is compared by Beijing, and the Shanghai area is compared by Shanghai. But unfortunately, this is not feasible, because the parent-child query is not related, and SQL does not know to compare according to the same city, so as long as a WHERE condition is added, it becomes a related subquery:
SELECT * FROM test as t1 where gdp > (
select avg(gdp) from test as t2 where t1.city = t2.city
group by city
)
That is, every time the WHERE gdp >
condition is judged, the sub-query result is recalculated, and the average value is limited to the same city, which meets the requirements.
Summarize
Learn to use parent-child queries flexibly and master complex queries.
The first citizen of SQL is a set, so the so-called parent-child query is a flexible combination of parent-child sets. These sets can appear in almost any position. According to the number of sets, whether aggregation, and association conditions, scalar queries and correlated subqueries are derived.
A deeper understanding requires a large number of actual combat cases, but the same is true. After mastering complex queries, you can understand most of the SQL cases.
The discussion address is: Intensive Reading "SQL Complex Queries" Issue #403 ascoders/weekly
If you would like to participate in the discussion, please click here , there are new topics every week, released on weekends or Mondays. Front-end intensive reading - help you filter reliable content.
Pay attention to front-end intensive reading WeChat public
<img width=200 src="https://img.alicdn.com/tfs/TB165W0MCzqK1RjSZFLXXcn2XXa-258-258.jpg">
Copyright Notice: Free to reprint - non-commercial - non-derivative - keep attribution ( Creative Commons 3.0 License )
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。