数据集

我们这一篇文章采用PostgreSQL的SQL语法。重点我们关注select...from...where这种读操作,分析query (analytical query)。
十项全能ZehnkampfD数据集在 https://hyper-db.de/interface... 可以直接使用。另外在这个网页不允许进行写操作:insert, update, delete之类的transactional query。当然create tabledrop table也不被允许。

本地载入改数据集https://segmentfault.com/a/11...

英文Schema: $$ZehnkampfD = \{\underline{Name, Discipline}, points\}$$

德文Schema: $$ZehnkampfD = \{\underline{Name, Disziplin}, punkte\}$$

有关英文Schema:在我的文章提到这一块内容。德文schema可以直接在HyPer网页接口运行。这里为了方便大家直接在网页上运行,我采用德文schema。

Schma和大部分SQL语句来自Prof. Alfons Kemper, Ph.D.的课件和书。

课件:

书: https://db.in.tum.de/teaching...

中级SQL

  • 找出在所有Diszipplin都比Bolt好的运动员:
select distinct z.name
from zehnkampfd as z
where not exists(
    select *
    from zehnkampfd z2
    where z2.name = z.name and exists(
        select *
        from zehnkampfd z3
        where z2.punkte <= z3.punkte and z2.disziplin = z3.disziplin and z3.name = 'Bolt'
        )
    )

用中文就是:不存在任意一个项目,这个项目存在bolt比当前运动员优秀。

我们来详细看看中间表格的结果是:

select *
    from zehnkampfd z2
    where  exists(
        select *
        from zehnkampfd z3
        where z2.punkte <= z3.punkte and z2.disziplin = z3.disziplin and z3.name = 'Bolt'
        )
    name     | disziplin  | punkte 
-------------+------------+--------
 Bolt        | 100m       |     50
 Bolt        | Weitsprung |     50
 Eaton       | 100m       |     40
 Behrenbruch | 100m       |     30
 Behrenbruch | Weitsprung |     50
(5 rows)


或者用count表达这个全部:

with better_as_bolt as (
    select z.name, z.disziplin
    from zehnkampfd z, zehnkampfd b
    where z.punkte > b.punkte and z.disziplin = b.disziplin and b.name = 'Bolt'
), num_dis as (
    select count(distinct disziplin) as num
    from zehnkampfd
)

select distinct name
from better_as_bolt
group by name
having count(*) = (select num from num_dis)

这个题目很类似中级SQL(1)中的这一题搜索听了所有sws=4 vorlesungen的学生:

  • 搜索100m的冠军(冠军定义为:没有人比这个运动员更好):
-- with correlated sub-query
select gold.name
from zehnkampfd gold
where gold.disziplin = '100m' and not exists(
    select *
    from zehnkampfd other
    where other.disziplin = gold.disziplin  and gold.punkte < other.punkte
    )

或者

-- with correlated sub-query
select gold.name
from zehnkampfd gold
where gold.disziplin = '100m' and gold.name not in (
    -- 存在有运动员比当前运动员更好的的情况
    select z1.name
    from zehnkampfd z1, zehnkampfd z2
    where z1.disziplin = z2.disziplin and z1.disziplin = '100m' and z1.punkte < z2.punkte
    )

或者

-- 与数字比较
select gold.name
from zehnkampfd gold
where gold.disziplin = '100m' and gold.punkte = (select max(punkte) from zehnkampfd where disziplin = '100m')
  • 搜索100m的亚军(亚军定义为:只存在一个运动员比当前运动员更好):
select silver.name
from zehnkampfd silver
where silver.disziplin = '100m' and 1 = (
    select count(*)
    from zehnkampfd gold -- 这个是冠军,比我们想选的运动员更好,我们只允许这样的人出现一次
    where gold.disziplin = '100m' and gold.punkte > silver.punkte
    )
select silver.name
from zehnkampfd silver
where silver.disziplin = '100m' and exists(
    select *
    from zehnkampfd gold -- 这个是冠军,比我们想选的运动员更好,我们只允许这样的人出现一次
    where gold.disziplin = '100m' and gold.punkte > silver.punkte and not exists(
        select *
        from zehnkampfd nobody -- 不再运行非冠军,比我们想选的运动员更好
        where nobody.disziplin = '100m' and gold.name != nobody.name and nobody.punkte > silver.punkte)
    )
  • 搜索100m的季军(季军定义为:只存在两个运动员比当前运动员更好):
select bronze.name
from zehnkampfd bronze
where bronze.disziplin = '100m' and 2 = (
    select count(*)
    from zehnkampfd other
    where other.disziplin = '100m' and other.punkte > bronze.punkte
    )
select bronze.name
from zehnkampfd bronze
where bronze.disziplin = '100m' and exists(
    select *
    from zehnkampfd gold, zehnkampfd silver
    where gold.disziplin = '100m' and gold.punkte > bronze.punkte and
          silver.disziplin = '100m' and silver.punkte > bronze.punkte and
          gold.name != silver.name and not exists(
        select *
        from zehnkampfd nobody -- 不再允许非冠军和亚军
        where nobody.disziplin = '100m' and gold.name != nobody.name and silver.name != nobody.name and nobody.punkte > bronze.punkte)
    )

罗济高
1 声望1 粉丝