原文链接:http://www.pilishen.com/posts...
我们都曾在数据库中处理过层级数据-这种数据中的每项都有一个父项和(0或多个)子项,根项除外。比如:论坛和邮件列表中的分类、商业组织结构表、内容管理系统的分类和产品分类等等。
在关系型数据库中处理层级数据时我们总会觉得关系型数据库不是为处理层级数据设计的,因为关系型数据库的数据表不像XML具有层级,而是一个简单的扁平化的表。所以层级数据的这种父-子关系不能在数据表中自然的展现出来。
下面我们介绍两种在关系型数据库中处理层级数据的模型:
Adjacent List Model 邻接表模型
我们以下图的电子产品分类为例
通常上面的产品分类会像下面这样来设计表结构并被储存:
CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);
INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
(4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',6),(8,'FLASH',7),
(9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6);
SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
10 rows in set (0.00 sec)
在邻接表模型中,表中的每条记录都包含一个指向父项id的字段。根项也就是这里的electronics的父项id为null,这种表结构非常的简单,而且我们很清楚的看到:flash的父项是MP3 , MP3的父项是protable electronics、protable electronics 的父项是electronics等等。虽然在客户端编码中邻接表模型处理起来也相当的简单,但是如果是纯SQL编码的话,该模型会有很多问题。
检索整个树
检索整个树是处理层级数据最常见的任务,为了完成这个最常用的方法是通过自连接(self-join):
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS';
+-------------+----------------------+--------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS | TUBE | NULL |
| ELECTRONICS | TELEVISIONS | LCD | NULL |
| ELECTRONICS | TELEVISIONS | PLASMA | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
+-------------+----------------------+--------------+-------+
6 rows in set (0.00 sec)
检索所有的叶子节点
我们可以通过左连接(left-join)检索出所有的叶子节点(没有子节点的节点):
SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
检索一条路径
自连接同样可以让我检索出层级结构中的一条完整的层级路径
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH';
+-------------+----------------------+-------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+-------------+-------+
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
+-------------+----------------------+-------------+-------+
1 row in set (0.01 sec)
这个方法的主要局限性在于对于层级结构中的每一层都要自连接,并且随着层级的增加这种自连接将变的越来越复杂也就自然的影响着性能。
邻接表模型的局限性
在纯SQL中使用邻接表模型会很困难,如果想检索某个分类在层级结构中的路径就需要事先知道它在层级结构中所处的层级,此外当进行删除某项操作时需要非常小心,因为在删除的过程中可能会导致出现整个的孤立的子树(删除portable electronics 会使它下面所有的子项变孤立)。其中一些局限可以通过使用客户端编码或Stored Procedure(存储过程是一组为了完成特定功能的SQL语句集,经编译后存储在数据库中,用户通过指定存储过程的名字并给定参数来调用执行它)来解决。使用面向过程的语言,我们可以从树的底部开始,向上迭代返回完整的树或一条路径。我们也可以通过提升子项和对剩下的子项重新排序以使之指向新的父项来避免产生孤立的子树。
The Nested Set Model 嵌套集合模型
本篇文章将聚焦于嵌套集合模型,在这种模型中我们可以以一种新的方式来看待我们的层级数据,不是以节点和节点之间的线,而是以一种嵌套容器的方式,试着以下面的方式来展示我们的电器分类:
注意我们是怎么实现层级结构的,父项包含它们所有的子节点。我们通过使用左值和右值来在数据表上表示这种形式的层级结构,以此表示项目的嵌套关系
CREATE TABLE nested_category (
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL
);
INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4),
(4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19),(7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13),
(9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18);
SELECT * FROM nested_category ORDER BY category_id;
+-------------+----------------------+-----+-----+
| category_id | name | lft | rgt |
+-------------+----------------------+-----+-----+
| 1 | ELECTRONICS | 1 | 20 |
| 2 | TELEVISIONS | 2 | 9 |
| 3 | TUBE | 3 | 4 |
| 4 | LCD | 5 | 6 |
| 5 | PLASMA | 7 | 8 |
| 6 | PORTABLE ELECTRONICS | 10 | 19 |
| 7 | MP3 PLAYERS | 11 | 14 |
| 8 | FLASH | 12 | 13 |
| 9 | CD PLAYERS | 15 | 16 |
| 10 | 2 WAY RADIOS | 17 | 18 |
+-------------+----------------------+-----+-----+
我们使用lft和rgt来表示左值和右值,然而我们怎么确定这两个值呢?我们从外出节点最左边向右边编号:
这种设计也可以运用在经典树上:
检索整个树
由于一个节点的左值永远在其父节点的左值和右值之间,所以我们可以通过自连接来检索一整个树:
SELECT node.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND parent.name = 'ELECTRONICS'
ORDER BY node.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+----------------------+
不想上面邻接表模型中的例子,这种查询不需要知道树的深度,我们也不需要在意是否需要在BETWEEN语句中加上节点右值的判断语句,因为右值总是与左值处于相同的父项下。
检索所有的叶子节点
在嵌套集合模型中检索叶子节点也比在邻接表模型中使用左连接方法要简单得多。如果你仔细观察nested_category表,你会发现所有的叶子节点的左值与右值是两个连续的数字,所有为了检索叶子节点,我们只需要检索那些rgt = lft +1 的节点就行了:
SELECT name
FROM nested_category
WHERE rgt = lft + 1;
+--------------+
| name |
+--------------+
| TUBE |
| LCD |
| PLASMA |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+--------------+
检索一条路径
使用嵌套集合模型,我们可以检索一条路径而不需要之前那样复杂的多次自连接:
SELECT parent.name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'FLASH'
ORDER BY parent.lft;
+----------------------+
| name |
+----------------------+
| ELECTRONICS |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
+----------------------+
检索节点的深度
我们已经了解了怎么检索整个树,但是如果我们还需要知道树中的每个节点的深度,以便更好的确定层级结构中的各个节点到底处于什么位置怎么办呢? 我们可以通过增加COUNT函数和一条GROUP BY 语句到我们已存在的查询语句中来实现显示整个树:
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| ELECTRONICS | 0 |
| TELEVISIONS | 1 |
| TUBE | 2 |
| LCD | 2 |
| PLASMA | 2 |
| PORTABLE ELECTRONICS | 1 |
| MP3 PLAYERS | 2 |
| FLASH | 3 |
| CD PLAYERS | 2 |
| 2 WAY RADIOS | 2 |
+----------------------+-------+
我们可以使用CONCAT和REPEAT字符串函数来根据深度缩进分类名称:
SELECT CONCAT( REPEAT(' ', COUNT(parent.name) - 1), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+-----------------------+
当然,你可以利用深度值直接在客户端应用程序上展现层级结构。Web开发者可以遍历树,并根据深度值的变化增加 <li></li>
和<ul></ul>
标签。
子树深度
当我们需要子树的深度信息时,我们不能在自连接中限制节点或父表,因为它会破坏我们的结果。相反,我们添加了一个第三自连接,以及一个子查询以确定这个深度将作为子树的新起点:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'PORTABLE ELECTRONICS'
GROUP BY node.name
ORDER BY node.lft
)AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| MP3 PLAYERS | 1 |
| FLASH | 2 |
| CD PLAYERS | 1 |
| 2 WAY RADIOS | 1 |
+----------------------+-------+
这个函数可以被运用到任何节点上,包括根节点。查询的深度值总是相对应该指定节点。
查找节点的直接后代
假设你正在一个零售网站上展示电子产品分类。当一个用户点击一个分类后,你想要显示那个分类下面的产品并列出它们的子分类,而不是它们下面的完整分类树,为了达到目的,我们需要显示它的直接子节点。但是不向下更进一步检索。例如:我们显示PORTABLE ELECTRONICS 分类,我们想显示P3 PLAYERS, CD PLAYERS, 和2 WAY RADIOS,但是不想显示FLASH。
我们可以在之前的语句上添加一条HABING语句来轻松实现:
SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
nested_category AS parent,
nested_category AS sub_parent,
(
SELECT node.name, (COUNT(parent.name) - 1) AS depth
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.name = 'PORTABLE ELECTRONICS'
GROUP BY node.name
ORDER BY node.lft
)AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth <= 1
ORDER BY node.lft;
+----------------------+-------+
| name | depth |
+----------------------+-------+
| PORTABLE ELECTRONICS | 0 |
| MP3 PLAYERS | 1 |
| CD PLAYERS | 1 |
| 2 WAY RADIOS | 1 |
+----------------------+-------+
如果想显示父节点,将HAVING depth <= 1
改为HAVING depth = 1
。
嵌套集合中的聚合函数
让我们添加一个产品表来演示聚合函数:
CREATE TABLE product
(
product_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(40),
category_id INT NOT NULL
);
INSERT INTO product(name, category_id) VALUES('20" TV',3),('36" TV',3),
('Super-LCD 42"',4),('Ultra-Plasma 62"',5),('Value Plasma 38"',5),
('Power-MP3 5gb',7),('Super-Player 1gb',8),('Porta CD',9),('CD To go!',9),
('Family Talk 360',10);
SELECT * FROM product;
+------------+-------------------+-------------+
| product_id | name | category_id |
+------------+-------------------+-------------+
| 1 | 20" TV | 3 |
| 2 | 36" TV | 3 |
| 3 | Super-LCD 42" | 4 |
| 4 | Ultra-Plasma 62" | 5 |
| 5 | Value Plasma 38" | 5 |
| 6 | Power-MP3 128mb | 7 |
| 7 | Super-Shuffle 1gb | 8 |
| 8 | Porta CD | 9 |
| 9 | CD To go! | 9 |
| 10 | Family Talk 360 | 10 |
+------------+-------------------+-------------+
现在让我们来写一套查询语句来检索带有各分类产品数量的分类树:
SELECT parent.name, COUNT(product.name)
FROM nested_category AS node ,
nested_category AS parent,
product
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.category_id = product.category_id
GROUP BY parent.name
ORDER BY node.lft;
+----------------------+---------------------+
| name | COUNT(product.name) |
+----------------------+---------------------+
| ELECTRONICS | 10 |
| TELEVISIONS | 5 |
| TUBE | 2 |
| LCD | 1 |
| PLASMA | 2 |
| PORTABLE ELECTRONICS | 5 |
| MP3 PLAYERS | 2 |
| FLASH | 1 |
| CD PLAYERS | 2 |
| 2 WAY RADIOS | 1 |
+----------------------+---------------------+
这是我们典型的整树查询语句,包含一个COUNT函数和GROUP BY 函数,以及产品表的引用和在WHERE语句中对node和product表的关联。就像你看到的,每个分类都计数,子分类的产品数量在父分类上反应出来
添加新的节点
既然我们以及学会了怎么检索我们的树,接下来我们来了解下怎么添加新的节点到我们的树中,我们再来看看下面的图表:
如果我想在TELEVISIONS 和PORTABLE ELECTRONICS 中间添加新的节点,那么这个新的节点将会给予lft 和 rgt 分别为10和11(我们是从左向右编号的),它右边所有的节点的lft和rgt值都将加2,这些可以使用MySql储存过程解决:
LOCK TABLE nested_category WRITE;
SELECT @myRight := rgt FROM nested_category
WHERE name = 'TELEVISIONS';
UPDATE nested_category SET rgt = rgt + 2 WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft + 2 WHERE lft > @myRight;
INSERT INTO nested_category(name, lft, rgt) VALUES('GAME CONSOLES', @myRight + 1, @myRight + 2);
UNLOCK TABLES;
We can then check our nesting with our indented tree query:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| GAME CONSOLES |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
+-----------------------+
如果我们想给一个没有子节点的节点添加一个子节点,我们需要稍微改改我们的语句。让我们在 2 WAY RADIOS节点下面添加新的FRS节点:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft FROM nested_category
WHERE name = '2 WAY RADIOS';
UPDATE nested_category SET rgt = rgt + 2 WHERE rgt > @myLeft;
UPDATE nested_category SET lft = lft + 2 WHERE lft > @myLeft;
INSERT INTO nested_category(name, lft, rgt) VALUES('FRS', @myLeft + 1, @myLeft + 2);
UNLOCK TABLES;
在这个例子中我们将我们新的父节点的左值右边的值全部扩大。然后将改节点插入到父节点左值的右边:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| GAME CONSOLES |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+-----------------------+
删除节点
最后我们了解下移除节点。删除节点时所采取的操作过程取决于节点在层级结构中的位置;删除叶节点比删除带有子节点的节点要容易得多,因为我们必须处理孤立的节点
删除叶子节点的过程与添加新的节点正好相反,我们删除该节点并且删除父节点中它的宽度:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1
FROM nested_category
WHERE name = 'GAME CONSOLES';
DELETE FROM nested_category WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE nested_category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft - @myWidth WHERE lft > @myRight;
UNLOCK TABLES;
我们再次执行我们上面的缩进树的查询语句来确认我们删除了节点且没有破坏层级结构:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| MP3 PLAYERS |
| FLASH |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+-----------------------+
这个方法对删除节点和它的子节点同样有效:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1
FROM nested_category
WHERE name = 'MP3 PLAYERS';
DELETE FROM nested_category WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE nested_category SET rgt = rgt - @myWidth WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft - @myWidth WHERE lft > @myRight;
UNLOCK TABLES;
我们再次查询以验证我们成功的删除了整个子树:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+-----------------------+
| name |
+-----------------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| PORTABLE ELECTRONICS |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+-----------------------+
另一个场景是我们需要删除父节点,但是需要保留它的子节点。有时你可能会仅仅是将它的名称改为一个占位符直到新的名称来替换它,有时这些子节点需要提升到被删除的父节点的层级:
LOCK TABLE nested_category WRITE;
SELECT @myLeft := lft, @myRight := rgt, @myWidth := rgt - lft + 1
FROM nested_category
WHERE name = 'PORTABLE ELECTRONICS';
DELETE FROM nested_category WHERE lft = @myLeft;
UPDATE nested_category SET rgt = rgt - 1, lft = lft - 1 WHERE lft BETWEEN @myLeft AND @myRight;
UPDATE nested_category SET rgt = rgt - 2 WHERE rgt > @myRight;
UPDATE nested_category SET lft = lft - 2 WHERE lft > @myRight;
UNLOCK TABLES;
这里我们改节点右侧的节点的左右值全部减去2,该节点的所有子节点的左右值全部减去1,我们再次来验证下这些元素有没有被提升:
SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) ), node.name) AS name
FROM nested_category AS node,
nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY node.name
ORDER BY node.lft;
+---------------+
| name |
+---------------+
| ELECTRONICS |
| TELEVISIONS |
| TUBE |
| LCD |
| PLASMA |
| CD PLAYERS |
| 2 WAY RADIOS |
| FRS |
+---------------+
总结
相对于邻接表模型,嵌套集合模型在层级数据的查询中具有极大的优势,而在数据的插入和删除操作时则根据插入或删除数据的位置的不同,可能需要更新很多节点甚至是整个树的左右值,从而影响数据库的性能。对于客户端需要频繁修改表的程序我们应该避免使用嵌套集合模型,而对于客户端需要频繁的查询表的程序我们应当使用它,像商城的产品分类表,表的数据只是在后台维护,而大量的用户会产生大量的查询。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。