Trouble implementing a "rope" data structure in C++

I'm trying to make a rope data structure. It's a type of binary tree, ie a recursive data structure.

The purpose of a rope is that splitting and concatenation should be fast, which means you avoid copying entire ropes.
So, for example, a user should be able to say rope1 + rope2 and expect a result in ~logarithmic time.

However, this presents a problem:
If a rope is modified, its parents are indirectly modified as well.
Because my goal is to make rope a drop-in replacement for string , this is not acceptable.

My solution to this is: Whenever there is any kind of change to a rope , I would create a new node, with a slight change, leaving the old ones unmodified.

In theory, I think this would work rather well.

In practice, though, it involves a heap allocation for (almost?) every modification of the strings.
Even a one-character change would result in a new heap object, which is not only slow by itself, but which also significantly decreases memory locality, very negatively impacting performance.

How should I go about solving this problem?


The traditional approach would be copy-on-write: for this, you need to refcount each allocated node.

If the modified node has a refcount of 1 (no-one else refers to it), you don't need to duplicate it.

The practical problem with this is usefully segregating mutating from non-mutating operations:

    char& rope::operator[] (std::string::pos)

may mutate the referenced char, but there's no trivial way to force selection of the const overload when it actually won't. This means you either have to assume mutation will happen, and possibly trigger an unnecessary copy, or return instead some proxy which overloads char conversion and assignment.

This approach was tried for early implementations of std::string (where a string is equivalent to a single-node rope) iirc, and fell out of favour; partly because of the mutation problem, and partly because COW and the required refcounting become increasingly expensive if you have to worry about multi-threading.


As you say, the rope has an additional problem if your node contains two independent types of state: its own string and references to its child nodes (since this causes refcount/child reference mutations to propogate up the tree).

If instead the characters are stored only at the leaf nodes, and you do a full copy of the non-leaf nodes (so each rope has its own "directory structure") you still avoid copying the characters, and the refcounted shared state is much simpler.

Does this get the logarithmic-time concatenation you want? Perhaps not:

  • you have to copy all the non-leaf nodes (and add a new root), and the number of these is log the number of leaves
  • you also have to increment the leaf refcounts though, which is linear
  • Whether it looks closer to linear or logarithmic time depends on the relative cost of incrementing a refcount versus copying the directory tree.

    Without this though, you get fast concatenations but arbitrary character accesses may (unpredictably) degenerate to logarithmic time if they have to propagate a COW operation up the tree.

    I guess, if it suits your use case, you could implement move concatenations: this would give possibly constant-time addition and you still avoid the extra COW complexity.


    In practice, though, it involves a heap allocation for (almost?) every modification of the strings.

    If you want to avoid frequent heap allocation performance issues, then I'd suggest using a memory pool for your class that allocates a chunk of memory, and only needs to request a new allocation from the OS when it's full, which should occur pretty infrequently. You can then optimize accesses to the memory pool for allocating small-block data-types like char , etc.

    Andrei Alexandrescu has a great example of a small-block memory allocator in his "Modern C++ Design" book.

    链接地址: http://www.djcxy.com/p/39802.html

    上一篇: 绳索数据结构

    下一篇: 无法在C ++中实现“绳索”数据结构