如何根据每个分类数据占比随机获取数据集?

假设有张新闻表,它存储了基本信息以及分类ID

IDcagegory_idtitle
11一条SQL秀翻全场
22ddd
32cccc
42

……

现在需要根据每个分类的数据占比,比如A分类的文章在全部新闻中占 5%,B分类 3% …

然后根据这个百分比 随机查出数据
即得到的记录集 比如 20条,其中 A分类文章 1条(5%),B分类(3%) 没有或者取整 1 条 …

这个一条 Sql 可以实现?

阅读 2.3k
2 个回答
SELECT category_id, COUNT(*) * 100.0 / (SELECT COUNT(*) FROM news) as percentage
FROM news
GROUP BY category_id;

SELECT * FROM news WHERE category_id = A ORDER BY RAND() LIMIT <A的记录数>;
SELECT * FROM news WHERE category_id = B ORDER BY RAND() LIMIT <B的记录数>;
...

以下 SQLSQLite 测试通过:

WITH
  data AS (
    SELECT id, cume_dist() OVER win AS pct
    FROM 新闻表
    WINDOW win AS (PARTITION BY category_id ORDER BY random())
  ),
  pick_ids AS (
    SELECT id
    FROM data
    ORDER BY pct
    LIMIT 20
  )
SELECT title
FROM pick_ids
JOIN 新闻表 USING(id);
推荐问题