What is the most efficient/elegant way to parse a flat table into a tree?

Assume you have a flat table that stores an ordered tree hierarchy:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

Here's a diagram, where we have [id] Name . Root node 0 is fictional.

                       [0] ROOT
                          /     
              [1] Node 1          [3] Node 2
              /                          
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?

Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.

Pseudo code or plain English is okay, this is purely a conceptional question.

Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?


EDITS AND ADDITIONS

To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.

The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.

The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.

Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.

I have posted my own solution so you guys can pull it to pieces.


Now that MySQL 8.0 is nearing release, all popular SQL databases will support recursive queries in standard syntax.

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

Below is my original answer from 2008:


There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).
  • Another solution is called Nested Sets , and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

    I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

    I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

    CREATE TABLE ClosureTable (
      ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
      descendant_id INT NOT NULL REFERENCES FlatTable(id),
      PRIMARY KEY (ancestor_id, descendant_id)
    );
    

    Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

    INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
      (1,1), (1,2), (1,4), (1,6),
      (2,2), (2,4),
      (3,3), (3,5),
      (4,4),
      (5,5),
      (6,6);
    

    Now you can get a tree starting at node 1 like this:

    SELECT f.* 
    FROM FlatTable f 
      JOIN ClosureTable a ON (f.id = a.descendant_id)
    WHERE a.ancestor_id = 1;
    

    The output (in MySQL client) looks like the following:

    +----+
    | id |
    +----+
    |  1 | 
    |  2 | 
    |  4 | 
    |  6 | 
    +----+
    

    In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.


    Re: comment from e-satis about immediate children (or immediate parent). You can add a " path_length " column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

    INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
      (1,1,0), (1,2,1), (1,4,2), (1,6,1),
      (2,2,0), (2,4,1),
      (3,3,0), (3,5,1),
      (4,4,0),
      (5,5,0),
      (6,6,0);
    

    Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.

    SELECT f.* 
    FROM FlatTable f 
      JOIN ClosureTable a ON (f.id = a.descendant_id)
    WHERE a.ancestor_id = 1
      AND path_length = 1;
    
    +----+
    | id |
    +----+
    |  2 | 
    |  6 | 
    +----+
    

    Re comment from @ashraf: "How about sorting the whole tree [by name]?"

    Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name , and sort by the name.

    SELECT f.name
    FROM FlatTable f 
    JOIN ClosureTable a ON (f.id = a.descendant_id)
    WHERE a.ancestor_id = 1
    ORDER BY f.name;
    

    Re comment from @Nate:

    SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
    FROM FlatTable f 
    JOIN ClosureTable a ON (f.id = a.descendant_id) 
    JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
    WHERE a.ancestor_id = 1 
    GROUP BY a.descendant_id 
    ORDER BY f.name
    
    +------------+-------------+
    | name       | breadcrumbs |
    +------------+-------------+
    | Node 1     | 1           |
    | Node 1.1   | 1,2         |
    | Node 1.1.1 | 1,2,4       |
    | Node 1.2   | 1,6         |
    +------------+-------------+
    

    A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

    The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name , presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

    If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.


    If you use nested sets (sometimes referred to as Modified Pre-order Tree Traversal) you can extract the entire tree structure or any subtree within it in tree order with a single query, at the cost of inserts being more expensive, as you need to manage columns which describe an in-order path through thee tree structure.

    For django-mptt, I used a structure like this:

    id  parent_id  tree_id  level  lft  rght
    --  ---------  -------  -----  ---  ----
     1       null        1      0    1    14
     2          1        1      1    2     7
     3          2        1      2    3     4
     4          2        1      2    5     6
     5          1        1      1    8    13
     6          5        1      2    9    10
     7          5        1      2    11   12
    

    Which describes a tree which looks like this (with id representing each item):

     1
     +-- 2
     |   +-- 3
     |   +-- 4
     |
     +-- 5
         +-- 6
         +-- 7
    

    Or, as a nested set diagram which makes it more obvious how the lft and rght values work:

     __________________________________________________________________________
    |  Root 1                                                                  |
    |   ________________________________    ________________________________   |
    |  |  Child 1.1                     |  |  Child 1.2                     |  |
    |  |   ___________    ___________   |  |   ___________    ___________   |  |
    |  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
    1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
    |  |________________________________|  |________________________________|  |
    |__________________________________________________________________________|
    

    As you can see, to get the entire subtree for a given node, in tree order, you simply have to select all rows which have lft and rght values between its lft and rght values. It's also simple to retrieve the tree of ancestors for a given node.

    The level column is a bit of denormalisation for convenience more than anything and the tree_id column allows you to restart the lft and rght numbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lft and rght columns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notes at the time when I was trying to wrap my head around the queries required for each operation.

    In terms of actually working with this data to display a tree, I created a tree_item_iterator utility function which, for each node, should give you sufficient information to generate whatever kind of display you want.

    More info about MPTT:

  • Trees in SQL
  • Storing Hierarchical Data in a Database
  • Managing Hierarchical Data in MySQL

  • As of Oracle 9i, you can use CONNECT BY.

    SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
    FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
    CONNECT BY PRIOR "Id" = "ParentId"
    START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)
    

    As of SQL Server 2005, you can use a recursive common table expression (CTE).

    WITH [NodeList] (
      [Id]
      , [ParentId]
      , [Level]
      , [Order]
    ) AS (
      SELECT [Node].[Id]
        , [Node].[ParentId]
        , 0 AS [Level]
        , CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
      FROM [Node]
      WHERE [Node].[ParentId] = 0
      UNION ALL
      SELECT [Node].[Id]
        , [Node].[ParentId]
        , [NodeList].[Level] + 1 AS [Level]
        , [NodeList].[Order] + '|'
          + CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
      FROM [Node]
        INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
    ) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
    FROM [Node]
      INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
    ORDER BY [NodeList].[Order]
    

    Both will output the following results.

    Name
    'Node 1'
    '    Node 1.1'
    '        Node 1.1.1'
    '    Node 1.2'
    'Node 2'
    '    Node 2.1'
    
    链接地址: http://www.djcxy.com/p/5160.html

    上一篇: 表命名困境:单数与复数名称

    下一篇: 什么是最有效率的/优雅的方式来解析一个平坦的表格到树?