How and when to create a suffix link in suffix tree?

Could anyone give me an example about how and when to create a suffix link in suffix tree?

If my string is ABABABC , but do use a different example if that is better.

Hope to give some pictures to illustrate every step.

very appreciate.


To understand this, first recall that there are three kinds of nodes in a suffix tree:

  • The root
  • Internal nodes
  • Leaf nodes
  • In the graph below, which is the suffix tree for ABABABC , the yellow circle is the root , the grey, blue and green ones are internal nodes , and the small black ones are leaves.

    There are two important things to notice:

  • Internal nodes always have more than 1 outgoing edge . That is, internal nodes mark those parts of the tree where branching occurs.

  • Branching occurs wherever a repeated string is involved, and only there. For any internal node X, the string leading from the root to X must have occurred in the input string at least as many times as there are outgoing edges from X.

  • Example: The string leading to the blue node is ABAB . Indeed, this string appears twice in the input string: At position 0 and at position 2. And that is why the blue node exists.

    Now about suffix links:

  • If the string s leading up to some internal node X is longer than 1 character, the same string minus the first character (call this s-1 ) must be in the tree, too (it's a suffix tree, after all, so the suffix of any of its strings must be in the tree, too).

    Example: Let s= ABAB , the string leading to the blue node. Then after removing the first character, s-1 is BAB . And indeed that string is found in the tree, too. It leads to the green node.

  • If some string s leads to an internal node, its shortened version s-1 must lead to an internal node (call it X-1) as well. Why? Because s must appear at least twice in the input string, so s-1 must appear at least as many times (because it is part of s : wherever s appears, s-1 must appear, too). But if s-1 appears multiple times in the input string, then there must be an internal node for it.

  • In any such situation, a special link connecting X to X-1 is a suffix link.

  • Note: Because of (1) and (2) above, every internal node X that has a label from root to X of more than 1 character must have a suffix link to exactly one other internal node.

    Example:

    This is the same suffix tree as before; the dotted lines indicate the suffix links. If you start at the blue node and follow the suffix links from there (from blue, to green, to first gray, to second gray), and look at the strings leading from the root to each node, you will see this:

     ABAB   ->    BAB    ->    AB    ->    B
    (blue)      (green)     (gray1)     (gray2)
    

    This is why they are called suffix links (the entire sequence is called suffix chain ).

    What are they good for?

    They are good for surprisingly many things. However, they play a particular role in Ukkonen's algorithm for suffix tree construction, specifically in Rule 3 described there: After inserting a the final character of some suffix s at some point X, the algorithm needs to insert the final character of suffix s-1 in O(1) time. In order to do that, it uses the suffix link to jump right to the place X-1 and makes the insert.

    But, note that there is no necessity to put suffix links in a suffix tree. They are not part of the definition of a suffix tree — they are just special links used by some algorithms that construct or use suffix trees.


    Regarding single-character nodes: What if there is an internal node X whose string (ie the string on the path from root to X) consists of only one character? By the definition above, X then does not have a suffix link. You can however assume that if it had a suffix link, it would point to the root node. Furthermore: If, by the definition above, an internal node does not have a suffix link, it must be a single-character node, so you can always assume that if no suffix link is present at an internal node it must be a single-character node, and therefore, the node that represents the s-1 suffix is the root node. (Some algorithms may actually add an explicit suffix link pointing to the root node in this case.) Thanks to j_random_hacker for the comment about this.

    链接地址: http://www.djcxy.com/p/40080.html

    上一篇: 为什么删除文件很重要

    下一篇: 如何以及何时在后缀树中创建后缀链接?