分层数据Hierarchical Data探索(例如:无限级分类、多级菜单、省份城市)

引言

第一篇 分层数据Hierarchical Data探索(1.递归) 已经介绍了分层数据以及使用递归算法实现了无限极分类,但是递归即浪费时间,又浪费空间(内存),尤其是在数据量大的情况下效率显著下降。
第二篇 分层数据Hierarchical Data探索(2.邻接表模型) 介绍了一种数据模型邻接表模型来实现,但在检索路径的过程中,除了本层外,每一层都会对应一个LEFT JOIN,那么如果层数不定怎么办?或者层数过多?

邻接表模型的局限性

用纯SQL编码实现邻接表模型有一定的难度。在我们检索某分类的路径之前,我们需要知道该分类所在的层次。在删除中间层的节点时,需要同时删除该节点下的所有节点,否则会出现孤立节点。

那么,在MySQL中如何更好的处理分层数据呢?下面我们来说一说嵌套集合模型

嵌套集合模型(Nested Set Model)

更多 嵌套集合模型(Nested Set Model)的介绍请见:wiki

在嵌套集合模型中,我们将以一种新的方式来理解我们的分层数据,不再是线与点了,而是嵌套容器。下图以嵌套容器的方式画出了electronics分类图:

通过集合的包含关系,嵌套结合模型可以表示分层结构,每一个分层可以用一个Set来表示(一个圈),父节点所在的圈包含所有子节点所在的圈。

为了用MySQL来表示集合关系,需要定义连个字段 lftrgt (表示一个集合的范围)。

# 为了模拟,我们创建一个表category包含三个字段:id,title,lft,rgt如下:
CREATE TABLE category (
  id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  title varchar(255) NOT NULL,
  lft int(10) NOT NULL,
  rgt int(10) NOT NULL
);

# 插入模拟数据
INSERT INTO category(title,lft,rgt) VALUES('Electronics',1,28);

INSERT INTO category(title,lft,rgt) VALUES('Laptops & PC',2,7);
 
INSERT INTO category(title,lft,rgt) VALUES('Laptops',3,4);
INSERT INTO category(title,lft,rgt) VALUES('PC',5,6);
 
INSERT INTO category(title,lft,rgt) VALUES('Cameras & photo',8,11);
INSERT INTO category(title,lft,rgt) VALUES('Camera',9,10);
 
INSERT INTO category(title,lft,rgt) VALUES('Phones & Accessories',12,27);
INSERT INTO category(title,lft,rgt) VALUES('Smartphones',13,20);
 
INSERT INTO category(title,lft,rgt) VALUES('Android',14,15);
INSERT INTO category(title,lft,rgt) VALUES('iOS',16,17);
INSERT INTO category(title,lft,rgt) VALUES('Other Smartphones',18,19);
 
INSERT INTO category(title,lft,rgt) VALUES('Batteries',21,22);
INSERT INTO category(title,lft,rgt) VALUES('Headsets',23,24);
INSERT INTO category(title,lft,rgt) VALUES('Screen Protectors',25,26);

select * from category;
+----+----------------------+-----+-----+
| id | title                | lft | rgt |
+----+----------------------+-----+-----+
|  1 | Electronics          |   1 |  28 |
|  2 | Laptops & PC         |   2 |   7 |
|  3 | Laptops              |   3 |   4 |
|  4 | PC                   |   5 |   6 |
|  5 | Cameras & photo      |   8 |  11 |
|  6 | Camera               |   9 |  10 |
|  7 | Phones & Accessories |  12 |  27 |
|  8 | Smartphones          |  13 |  20 |
|  9 | Android              |  14 |  15 |
| 10 | iOS                  |  16 |  17 |
| 11 | Other Smartphones    |  18 |  19 |
| 12 | Batteries            |  21 |  22 |
| 13 | Headsets             |  23 |  24 |
| 14 | Screen Protectors    |  25 |  26 |
+----+----------------------+-----+-----+
14 rows in set (0.00 sec)
  • 检索分层路径

由于子节点的 lft 值总在父节点的 lft 和 rgt 值之间,所以可以通过父节点连接到子节点上来检索整棵树

SELECT node.id,node.title,node.lft,node.rgt
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND parent.title = 'Electronics'
ORDER BY node.lft;
+----+----------------------+-----+-----+
| id | title                | lft | rgt |
+----+----------------------+-----+-----+
|  1 | Electronics          |   1 |  28 |
|  2 | Laptops & PC         |   2 |   7 |
|  3 | Laptops              |   3 |   4 |
|  4 | PC                   |   5 |   6 |
|  5 | Cameras & photo      |   8 |  11 |
|  6 | Camera               |   9 |  10 |
|  7 | Phones & Accessories |  12 |  27 |
|  8 | Smartphones          |  13 |  20 |
|  9 | Android              |  14 |  15 |
| 10 | iOS                  |  16 |  17 |
| 11 | Other Smartphones    |  18 |  19 |
| 12 | Batteries            |  21 |  22 |
| 13 | Headsets             |  23 |  24 |
| 14 | Screen Protectors    |  25 |  26 |
+----+----------------------+-----+-----+
14 rows in set (0.05 sec)

不像之前邻接表模型的例子,这个查询语句不管树的层次有多深都能很好的工作。在BETWEEN的子句中我们没有去关心node的rgt值,是因为使用node的rgt值得出的父节点总是和使用lft值得出的是相同的。

  • 检索所有叶子节点

检索出所有的叶子节点,使用嵌套集合模型的方法比邻接表模型的LEFT JOIN方法简单多了。如果你仔细得看了category表,你可能已经注意到叶子节点的左右值是连续的。要检索出叶子节点,我们只要查找满足 rgt=lft+1 的节点:

SELECT id,title,lft,rgt
FROM category
WHERE rgt = lft + 1;
+----+-------------------+-----+-----+
| id | title             | lft | rgt |
+----+-------------------+-----+-----+
|  3 | Laptops           |   3 |   4 |
|  4 | PC                |   5 |   6 |
|  6 | Camera            |   9 |  10 |
|  9 | Android           |  14 |  15 |
| 10 | iOS               |  16 |  17 |
| 11 | Other Smartphones |  18 |  19 |
| 12 | Batteries         |  21 |  22 |
| 13 | Headsets          |  23 |  24 |
| 14 | Screen Protectors |  25 |  26 |
+----+-------------------+-----+-----+
9 rows in set (0.00 sec)

查询

  • 检索单一路径

在嵌套集合模型中,我们可以不用多个自连接就可以检索出单一路径:

SELECT parent.id,parent.title,parent.lft,parent.rgt
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'PC'
ORDER BY parent.lft;

+----+--------------+-----+-----+
| id | title        | lft | rgt |
+----+--------------+-----+-----+
|  1 | Electronics  |   1 |  28 |
|  2 | Laptops & PC |   2 |   7 |
|  4 | PC           |   5 |   6 |
+----+--------------+-----+-----+
3 rows in set (0.00 sec)
  • 检索节点的深度

我们已经知道怎样去呈现一棵整树,但是为了更好的标识出节点在树中所处层次,我们怎样才能检索出节点在树中的层级呢?我们可以在之前的查询语句上增加COUNT函数和GROUP BY子句来实现:

SELECT node.title,(COUNT(parent.title) - 1) AS lev
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.title
ORDER BY node.lft;

+----------------------+-----+
| title                | lev |
+----------------------+-----+
| Electronics          |   0 |
| Laptops & PC         |   1 |
| Laptops              |   2 |
| PC                   |   2 |
| Cameras & photo      |   1 |
| Camera               |   2 |
| Phones & Accessories |   1 |
| Smartphones          |   2 |
| Android              |   3 |
| iOS                  |   3 |
| Other Smartphones    |   3 |
| Batteries            |   2 |
| Headsets             |   2 |
| Screen Protectors    |   2 |
+----------------------+-----+
14 rows in set (0.01 sec)
如果当前MySQL版本是5.7或者以上可能会出现 1055 的报错,下面是是解决办法
报错:
ERROR 1055 (42000): Expression #1 of ORDER BY clause is not in GROUP BY clause and contains nonaggregated column 'test.node.lft' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

原因:In 5.7 the sqlmode is set by default to:
ONLY_FULL_GROUP_BY,NO_AUTO_CREATE_USER,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION

解决:To remove the clause ONLY_FULL_GROUP_BY you can do this:
SET sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

This supposed you need to make that GROUP BY with non aggregated columns.

我们可以根据 lev 值来缩进分类名字,使用 CONCAT 和 REPEAT 字符串函数:

SELECT CONCAT( REPEAT(' ', COUNT(parent.title) - 1), node.title) AS name,(COUNT(parent.title) - 1) AS lev
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.title
ORDER BY node.lft;
+-----------------------+-----+
| name                  | lev |
+-----------------------+-----+
| Electronics           |   0 |
|  Laptops & PC         |   1 |
|   Laptops             |   2 |
|   PC                  |   2 |
|  Cameras & photo      |   1 |
|   Camera              |   2 |
|  Phones & Accessories |   1 |
|   Smartphones         |   2 |
|    Android            |   3 |
|    iOS                |   3 |
|    Other Smartphones  |   3 |
|   Batteries           |   2 |
|   Headsets            |   2 |
|   Screen Protectors   |   2 |
+-----------------------+-----+
14 rows in set (0.01 sec)
  • 检索子树的深度
SELECT node.title, (COUNT(parent.title) - (sub_tree.lev + 1)) AS lev
FROM category AS node,
    category AS parent,
    category AS sub_parent,
    (
        SELECT node.title, (COUNT(parent.title) - 1) AS lev
        FROM category AS node,
        category AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'Phones & Accessories'
        GROUP BY node.title
        ORDER BY node.lft
    ) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
    AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
    AND sub_parent.title = sub_tree.title
GROUP BY node.title
ORDER BY node.lft;

这个查询语句可以检索出任一节点子树的深度值,包括根节点。这里的深度值跟你指定的节点有关。

  • 检索节点的直接子节点

可以想象一下,你在零售网站上呈现电子产品的分类。当用户点击分类后,你将要呈现该分类下的产品,同时也需列出该分类下的直接子分类,而不是该分类下的全部分类。为此,我们只呈现该节点及其直接子节点,不再呈现更深层次的节点。
要实现它非常的简单,在先前的查询语句上添加 HAVING 子句:

SELECT node.title, (COUNT(parent.title) - (sub_tree.lev + 1)) AS lev
FROM category AS node,
    category AS parent,
    category AS sub_parent,
    (
        SELECT node.title, (COUNT(parent.title) - 1) AS lev
        FROM category AS node,
        category AS parent
        WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'Phones & Accessories'
        GROUP BY node.title
        ORDER BY node.lft
    ) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
    AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
    AND sub_parent.title = sub_tree.title
GROUP BY node.title
HAVING lev <= 1
ORDER BY node.lft;

如果你不希望呈现父节点,你可以更改 HAVING lev <= 1HAVING lev = 1

新增节点

  • 添加同一层次的节点

到现在,我们已经知道了如何去查询我们的树,是时候关注一下如何增加一个新节点来更新我们的树了。
当我们想要在 Laptops & PCCameras & photo节点之间新增一个节点,新节点的 lft 和 rgt 的 值为8和9,所有该节点的右边节点的lft和rgt值都将加2,之后我们再添加新节点并赋相应的lft和rgt值。我使用了锁表(LOCK TABLES)语句来隔离查询:

LOCK TABLE category WRITE;

SELECT @myRight := rgt FROM category WHERE title = 'Laptops & PC';

UPDATE category SET rgt = rgt + 2 WHERE rgt > @myRight;
UPDATE category SET lft = lft + 2 WHERE lft > @myRight;

INSERT INTO category(title, lft, rgt) VALUES('Game Consoles', @myRight + 1, @myRight + 2);

UNLOCK TABLES;

我们可以检验一下新节点插入的正确性:
SELECT CONCAT( REPEAT(' ', COUNT(parent.title) - 1), node.title) AS name,(COUNT(parent.title) - 1) AS lev
FROM category AS node,
     category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.title
ORDER BY node.lft;

+-----------------------+-----+
| name                  | lev |
+-----------------------+-----+
| Electronics           |   0 |
|  Laptops & PC         |   1 |
|   Laptops             |   2 |
|   PC                  |   2 |
|  Game Consoles        |   1 |
|  Cameras & photo      |   1 |
|   Camera              |   2 |
|  Phones & Accessories |   1 |
|   Smartphones         |   2 |
|    Android            |   3 |
|    iOS                |   3 |
|    Other Smartphones  |   3 |
|   Batteries           |   2 |
|   Headsets            |   2 |
|   Screen Protectors   |   2 |
+-----------------------+-----+
15 rows in set (0.00 sec)
  • 添加叶子节点

如果我们想要在叶子节点下增加节点,我们得稍微修改一下查询语句。让我们在 Camera 叶子节点下添加 SLR 节点:

LOCK TABLE category WRITE;

SELECT @myLeft := lft FROM category WHERE title = 'Camera';

UPDATE category SET rgt = rgt + 2 WHERE rgt > @myLeft;
UPDATE category SET lft = lft + 2 WHERE lft > @myLeft;

INSERT INTO category(title, lft, rgt) VALUES('SLR', @myLeft + 1, @myLeft + 2);

UNLOCK TABLES;

删除节点

最后删除节点。删除节点的处理过程跟节点在分层数据中所处的位置有关,删除一个叶子节点比删除一个子节点要简单得多,因为删除子节点的时候,我们需要去处理孤立节点。

  • 删除叶子节点

删除一个叶子节点的过程正好是新增一个叶子节点的逆过程,我们在删除节点的同时该节点右边所有节点的左右值和该父节点的右值都会减去该节点的宽度值:

LOCK TABLE category WRITE;


SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1 FROM category WHERE title = 'Game Consoles';


DELETE FROM category WHERE lft BETWEEN @myLeft AND @myRight;


UPDATE category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE category SET lft = lft - @myWidth WHERE lft > @myRight;

UNLOCK TABLES;
  • 删除子节点以及整颗子树
LOCK TABLE category WRITE;


SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1 FROM category WHERE title = 'Cameras & photo';


DELETE FROM category WHERE lft BETWEEN @myLeft AND @myRight;


UPDATE category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE category SET lft = lft - @myWidth WHERE lft > @myRight;

UNLOCK TABLES;
  • 删除该节点,而不删除该节点的子节点
LOCK TABLE category WRITE;


SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1 FROM category WHERE title = 'Cameras & photo';


DELETE FROM category WHERE lft = @myLeft;


UPDATE category SET rgt = rgt - 1, lft = lft - 1 WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE category SET rgt = rgt - 2 WHERE rgt > @myRight;
UPDATE category SET lft = lft - 2 WHERE lft > @myRight;

UNLOCK TABLES;

在这个例子中,我们对该节点所有右边节点的左右值都减去了2(因为不考虑其子节点,该节点的宽度为2),对该节点的子节点的左右值都减去了1(弥补由于失去父节点的左值造成的裂缝)

参考资源


arunfung
22 声望0 粉丝

努力做最好的自己