What are the options for storing hierarchical data in a relational database?

Good Overviews

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

  • One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found.
  • Models for hierarchical data: slides with good explanations of tradeoffs and example usage
  • Representing hierarchies in MySQL: very good overview of Nested Set in particular
  • Hierarchical data in RDBMSs: most comprehensive and well-organized set of links I've seen, but not much in the way of explanation
  • Options

    Ones I am aware of and general features:

  • Adjacency List:
  • Columns: ID, ParentID
  • Easy to implement.
  • Cheap node moves, inserts, and deletes.
  • Expensive to find the level (can store as a computed column), ancestry & descendants (Bridge Table combined with level column can solve), path (Lineage Column can solve).
  • Use Common Table Expressions in those databases that support them to traverse.
  • Nested Set (aka Modified Preorder Tree Traversal)
  • Popularized by Joe Celko in numerous articles and his book Trees and Hierarchies in SQL for Smarties
  • Columns: Left, Right
  • Cheap level, ancestry, descendants
  • Volatile encoding - moves, inserts, deletes more expensive.
  • Requires a specific sort order (eg created). So sorting all descendants in a different order requires additional work.
  • Nested Intervals
  • Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
  • Have to deal with real/float/decimal representation issues
  • A more complex matrix encoding variant adds the benefit of ancestor encoding, like materialized path for "free"
  • Bridge Table (aka Closure Table: some good ideas about how to use triggers for maintaining this approach)
  • Columns: ancestor, descendant
  • Stands apart from the table it describes.
  • Can include some nodes in more than one hierarchy.
  • Cheap ancestry and descendants (albeit not in what order)
  • For complete knowledge of a hierarchy needs to be combined with another option.
  • Flat Table
  • A modification of the Adjacency List that adds a Level and Rank (eg ordering) column to each record.
  • Expensive move and delete
  • Cheap ancestry and descendants
  • Good Use: threaded discussion - forums / blog comments
  • Lineage Column (aka Materialized Path, Path Enumeration)
  • Column: lineage (eg /parent/child/grandchild/etc...)
  • Limit to how deep the hierarchy can be.
  • Descendants cheap (eg LEFT(lineage, #) = '/enumerated/path' )
  • Ancestry tricky (database specific queries)
  • Multiple lineage columns
  • Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
  • Limit to how deep the hierarchy can be
  • Cheap ancestors, descendants, level
  • Cheap insert, delete, move of the leaves
  • Expensive insert, delete, move of the internal nodes
  • Database Specific Notes

    MySQL

  • Use session variables for Adjacency List
  • Oracle

  • Use CONNECT BY to traverse Adjacency Lists
  • PostgreSQL

  • ltree datatype for Materialized Path
  • SQL Server

  • General summary
  • 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.

  • My favorite answer is as what the first sentence in this thread suggested. Use an Adjacency List to maintain the hierarchy and use Nested Sets to query the hierarchy.

    The problem up until now has been that the coversion method from an Adjacecy List to Nested Sets has been frightfully slow because most people use the extreme RBAR method known as a "Push Stack" to do the conversion and has been considered to be way to expensive to reach the Nirvana of the simplicity of maintenance by the Adjacency List and the awesome performance of Nested Sets. As a result, most people end up having to settle for one or the other especially if there are more than, say, a lousy 100,000 nodes or so. Using the push stack method can take a whole day to do the conversion on what MLM'ers would consider to be a small million node hierarchy.

    I thought I'd give Celko a bit of competition by coming up with a method to convert an Adjacency List to Nested sets at speeds that just seem impossible. Here's the performance of the push stack method on my i5 laptop.

    Duration for     1,000 Nodes = 00:00:00:870 
    Duration for    10,000 Nodes = 00:01:01:783 (70 times slower instead of just 10)
    Duration for   100,000 Nodes = 00:49:59:730 (3,446 times slower instead of just 100) 
    Duration for 1,000,000 Nodes = 'Didn't even try this'
    

    And here's the duration for the new method (with the push stack method in parenthesis).

    Duration for     1,000 Nodes = 00:00:00:053 (compared to 00:00:00:870)
    Duration for    10,000 Nodes = 00:00:00:323 (compared to 00:01:01:783)
    Duration for   100,000 Nodes = 00:00:03:867 (compared to 00:49:59:730)
    Duration for 1,000,000 Nodes = 00:00:54:283 (compared to something like 2 days!!!)
    

    Yes, that's correct. 1 million nodes converted in less than a minute and 100,000 nodes in under 4 seconds.

    You can read about the new method and get a copy of the code at the following URL. http://www.sqlservercentral.com/articles/Hierarchy/94040/

    I also developed a "pre-aggregated" hierarchy using similar methods. MLM'ers and people making bills of materials will be particularly interested in this article. http://www.sqlservercentral.com/articles/T-SQL/94570/

    If you do stop by to take a look at either article, jump into the "Join the discussion" link and let me know what you think.


    This is a very partial answer to your question, but I hope still useful.

    Microsoft SQL Server 2008 implements two features that are extremely useful for managing hierarchical data:

  • the HierarchyId data type.
  • common table expressions, using the with keyword.
  • Have a look at "Model Your Data Hierarchies With SQL Server 2008" by Kent Tegels on MSDN for starts. See also my own question: Recursive same-table query in SQL Server 2008


    This design was not mentioned yet:

    Multiple lineage columns

    Though it has limitations, if you can bear them, it's very simple and very efficient. Features:

  • Columns: one for each lineage level, refers to all the parents up to the root, levels below the current items' level are set to NULL
  • Limit to how deep the hierarchy can be
  • Cheap ancestors, descendants, level
  • Cheap insert, delete, move of the leaves
  • Expensive insert, delete, move of the internal nodes
  • Here follows an example - taxonomic tree of birds so the hierarchy is Class/Order/Family/Genus/Species - species is the lowest level, 1 row = 1 taxon (which corresponds to species in the case of the leaf nodes):

    CREATE TABLE `taxons` (
      `TaxonId` smallint(6) NOT NULL default '0',
      `ClassId` smallint(6) default NULL,
      `OrderId` smallint(6) default NULL,
      `FamilyId` smallint(6) default NULL,
      `GenusId` smallint(6) default NULL,
      `Name` varchar(150) NOT NULL default ''
    );
    

    and the example of the data:

    +---------+---------+---------+----------+---------+-------------------------------+
    | TaxonId | ClassId | OrderId | FamilyId | GenusId | Name                          |
    +---------+---------+---------+----------+---------+-------------------------------+
    |     254 |       0 |       0 |        0 |       0 | Aves                          |
    |     255 |     254 |       0 |        0 |       0 | Gaviiformes                   |
    |     256 |     254 |     255 |        0 |       0 | Gaviidae                      |
    |     257 |     254 |     255 |      256 |       0 | Gavia                         |
    |     258 |     254 |     255 |      256 |     257 | Gavia stellata                |
    |     259 |     254 |     255 |      256 |     257 | Gavia arctica                 |
    |     260 |     254 |     255 |      256 |     257 | Gavia immer                   |
    |     261 |     254 |     255 |      256 |     257 | Gavia adamsii                 |
    |     262 |     254 |       0 |        0 |       0 | Podicipediformes              |
    |     263 |     254 |     262 |        0 |       0 | Podicipedidae                 |
    |     264 |     254 |     262 |      263 |       0 | Tachybaptus                   |
    

    This is great because this way you accomplish all the needed operations in a very easy way, as long as the internal categories don't change their level in the tree.

    链接地址: http://www.djcxy.com/p/2736.html

    上一篇: 如何防止SQL注入,PHP

    下一篇: 在关系数据库中存储分层数据有哪些选项?