Ukkonen's Algorithm Generalized Suffix Trees

I understand the ukkonen's algorithm. I am only curious how to extend it to have more than one string in it (ending with a special character say "$").

I read somewhere that Given strings s1(say "abcddefx$") and s2(say "abddefgh$"), I should insert the s1 normally by ukkonen's algo. Then traverse down the tree with s2. That is I should search for s2 in the tree. Once I get to a node where the search ends ("ab", after 'b') I should resume the ukkonen's algorithm from there.

I understand the basic logic behind this. But what I am curious about is, what happens to the old suffix links. Are they still valid??? Also I am confused about my triple (active_node,active_length,remainder) should it be (node representing "ab",0,0) as I start the new pass???


For dealing with special characters you can use the Unicode Private Use Areas. These are a few special ranges of characters reserved for your own use, however the ranges are only around 4000 characters in size. Depending on the unicode support of the language you are using this can be really easy or difficult.

If that does not work, instead of inserting characters into your tree, wrap them in some other sort of variable (struct, object, dictionary) to 'extend' their meaning. That way you can provide the extra information needed (is this the end of a string? which string is this the end of?). Then you can provide custom operators for equality on this new wrapper instead of using characters directly.

链接地址: http://www.djcxy.com/p/6824.html

上一篇: Ukkonen的后缀树算法,有什么必要?

下一篇: Ukkonen的算法广义后缀树