Implementing a hierarchical data structure in a database

I know there are two approaches: adjacency list and nested tree. It's said that adjacency list can become slow to use on traversal because of numerous queries. But I don't know any realistic figures for this. The site I'm making will have in the region of 200 pages. Is traversal to generate (for example) a sitemap going to take longer than about 0.3 seconds?

Running on MySQL (innoDB) with LAMP stack.

I'd prefer to implement adjacency if possible because of the more simplistic design.

Thanks.


There are more options than just the two you mention. There are:

  • Adjacency List (the "parent_id" one almost everyone uses)
  • Nested Sets
  • Path Enumeration
  • Closure Table (aka Adjacency Relation)
  • See my answer to "What is the most efficient/elegant way to parse a flat table into a tree?"

    Or a couple of books:

  • "Trees and Hierarchies in SQL for Smarties" by Joe Celko.
  • "SQL Design Patterns" by Vadim Tropashko.

  • The article Managing Hierarchical Data in MySQL goes in details about this.

    I would recommend the "nested set" technique, as it allows you to get the whole tree (and its children) in one query. Basically reads are cheap but writes are expensive because the whole tree has to be re-balanced. But in cases where you have 99% reads then its totally justifiable.


    The naive approach to parsing an adjacency list requires a lot of queries, and for large lists may take a significant amount of time to build in memory. For reference, the naive approach I'm referring to could be summarized as: Select all items with no parent, Then for each item recursively get it's children. This approach requires n+1 database queries.

    I've used the following approach to build an adjacency list with 1 query. Select all items form the database. Transfer all items into an array indexed by their key. Traverse the array and assign a reference from the parent object to each of it's children. Traverse the array a second time and remove all of the child objects leaving behind only the root level objects.

    Since you mentioned LAMP stack, PHP code to do this is roughly as follows:

    <?php
    // Assumes $src is the array if items from the database.
    $tmp = array();
    
    // Traverse the array and index it by id, ensuing each item has an empty array of children.
    foreach ($src as $item) {
      $item['children'] = array();
      $tmp[$item['id']] = $item;
    }
    
    // Now traverse the array a second time and link children to their parents.
    foreach ($tmp as $id => $item) {
      if ($item['parent_id'] != 0 || $item['parent_id'] !== NULL) {
        $tmp[$item['parent_id']]['children'][$id] = &$tmp[$id];
      }
    }
    
    // Finally create an array with just root level items.
    $tree = array();
    foreach ($tmp as $id => $item) {
      if ($item['parent_id'] == 0 || $item['parent_id'] === NULL) {
        $tree[$id] = $item;
      }
    }
    
    // $tree now contains our adjacency list in tree form.
    ?>
    

    Please note this code is intended to illustrate a technique for building an adjacency list from a single database query. It could probably be optimized for less memory consumption, etc. It also hasn't been tested.

    Jim,

    链接地址: http://www.djcxy.com/p/35034.html

    上一篇: 什么数据结构最适合VirtualStringTree?

    下一篇: 在数据库中实现分层数据结构