Finding difference between strings in Javascript

I'd like to compare two strings (a before and after) and detect exactly where and what changed between them.

For any change, I want to know:

  • Starting position of the change (inclusive, starting at 0)
  • Ending position of the change (inclusive, starting at 0) relative to the previous text
  • The "change"
  • Assume that strings will change in only one place at a time (for example, never " B il l " -> " K il n ").

    Additionally, I need the start and end positions to reflect the type of change:

  • If deletion, the start and end position should be the start and end positions of the deleted text, respectively
  • If replacement, the start and end position should be the start and end positions of the "deleted" text, respectively (the change will be the "added" text)
  • If insertion, the start and end positions should be the same; the entry point of the text
  • If no change, let start and end positions remain zero, with an empty change
  • For example:

    "0123456789" -> "03456789"  
    Start: 1, End: 2, Change: "" (deletion)
    
    "03456789" -> "0123456789"  
    Start: 1, End: 1, Change: "12" (insertion)
    
    "Hello World!" -> "Hello Aliens!"  
    Start: 6, End: 10, Change: "Aliens" (replacement)
    
    "Hi" -> "Hi"  
    Start: 0, End: 0, Change: "" (no change)
    

    I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made.

    var OldText = "My edited string!";
    var NewText = "My first string!";
    
    var ChangeStart = 0;
    var NewChangeEnd = 0;
    var OldChangeEnd = 0;
    console.log("Comparing start:");
    for (var i = 0; i < NewText.length; i++) {
        console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
        if (NewText[i] != OldText[i]) {
            ChangeStart = i;
            break;
        }
    }
    console.log("Comparing end:");
    // "Addition"?
    if (NewText.length > OldText.length) {
        for (var i = 1; i < NewText.length; i++) {
            console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
            if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
                NewChangeEnd = NewText.length - i;
                OldChangeEnd = OldText.length - i;
                break;
            }
        }
    // "Deletion"?
    } else if (NewText.length < OldText.length) {
        for (var i = 1; i < OldText.length; i++) {
            console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
            if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
                NewChangeEnd = NewText.length - i;
                OldChangeEnd = OldText.length - i;
                break;
            }
        }
    // Same length...
    } else {
        // Do something
    }
    console.log("Change start: " + ChangeStart);
    console.log("NChange end : " + NewChangeEnd);
    console.log("OChange end : " + OldChangeEnd);
    console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));
    

    How do I tell whether or not an insertion, deletion, or replacement took place?


    I've searched and came up with a few other similar questions, but they don't seem to help.


    I have gone through your code and your logic for matching string makes sense to me. It logs ChangeStart , NewChangeEnd and OldChangeEnd correctly and the algorithm flows alright. You just want to know if an insertion , deletion or replacement took place. Here's how I would go about it.

    First of all, you need to make sure that after you have got the first point of mis-match ie ChangeStart when you then traverse the strings from the end, the index shouldn't cross ChangeStart .

    I'll give you an example. Consider the following strings:

     var NewText = "Hello Worllolds!";
     var OldText = "Hello Worlds!";
    
     ChangeStart -> 10 //Makes sense
     OldChangeEnd -> 8
     NewChangeEnd -> 11
    
     console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1)); 
     //Ouputs "lo"
    

    The problem in this case is when it starts matching from the back, the flow is something like this:

     Comparing end: 
      1(N: 12 O: 12: ! -> !) 
      2(N: 11 O: 11: s -> s) 
      3(N: 10 O: 10: d -> d)  -> You need to stop here!
    
     //Although there is not a mismatch, but we have reached ChangeStart and 
     //we have already established that characters from 0 -> ChangeStart-1 match
     //That is why it outputs "lo" instead of "lol"
    

    Assuming, what I just said makes sense, you just need to modify your for loops like so:

     if (NewText.length > OldText.length) {
     for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
      ...
    
        NewChangeEnd = NewText.length - i -1;
        OldChangeEnd = OldText.length - i -1;
      if(//Mismatch condition reached){
             //break..That code is fine.
        }
     }
    

    This condition -> (OldText.length-i)>=ChangeStart takes care of the anomaly that I mentioned and therefore the for loop automatically terminates if this condition is reached. However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. So you need to update values of NewChangeEnd and OldChangeEnd as 1 less than the matched value. In case of a mis-match, you store the values appropriately.

    Instead of an else -if we could just wrap those two conditions in a situation where we know NewText.length > OldText.length is definitely not true ie it is either a replacement or a deletion . Again NewText.length > OldText.length also means it could be a replacement or an insertion as per your examples, which makes sense. So the else could be something like:

    else {
    for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) { 
    
        ...
        NewChangeEnd = NewText.length - i -1;
        OldChangeEnd = OldText.length - i -1;
      if(//Mismatch condition reached){
             //break..That code is fine.
        }
     }
    

    If you have understood the minor changes thus far, identifying the specific cases is really simple:

  • Deletion - Condition -> ChangeStart > NewChangeEnd . Deleted string from ChangeStart -> OldChangeEnd .
  • Deleted text -> OldText.substring(ChangeStart, OldChangeEnd + 1);

  • Insertion - Condition -> ChangeStart > OldChangeEnd . Inserted string at ChangeStart .
  • Inserted text -> NewText.substring(ChangeStart, NewChangeEnd + 1);

  • Replacement - If NewText != OldText and the above two conditions are not met, then it is a replacement.
  • Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1);

    The replacement text -> NewText.substring(ChangeStart, NewChangeEnd + 1);

    Start and end positions in the OldText that got replaced -> ChangeStart -> OldChangeEnd

    I have created a jsfiddle incorporating the changes that I have mentioned in your code. You might want to check it out. Hope it gets you started in the right direction.


    I had a similar problem and solved it with the following:

    function diff(oldText, newText) {
    
      // Find the index at which the change began
      var s = 0;
      while(s < oldText.length && s < newText.length && oldText[s] == newText[s]) {
        s++;
      }
    
      // Find the index at which the change ended (relative to the end of the string)
      var e = 0;
      while(e < oldText.length &&
            e < newText.length &&
            oldText.length - e > s &&
            newText.length - e > s &&
            oldText[oldText.length - 1 - e] == newText[newText.length - 1 - e]) {
        e++;
      }
    
      // The change end of the new string (ne) and old string (oe)
      var ne = newText.length - e;
      var oe = oldText.length - e;
    
      // The number of chars removed and added
      var removed = oe - s;
      var added = ne - s;
    
      var type;
      switch(true) {
        case removed == 0 && added > 0:  // It's an 'add' if none were removed and at least 1 added
          type = 'add';
          break;
        case removed > 0 && added == 0:  // It's a 'remove' if none were added and at least one removed
          type = 'remove';
          break;
        case removed > 0 && added > 0:   // It's a replace if there were both added and removed characters
          type = 'replace';
          break;
        default:
          type = 'none';                 // Otherwise there was no change
          s = 0;
      }
    
      return { type: type, start: s, removed: removed, added: added };
    }
    

    Note, this didn't solve my actual problem though. My issue was that I had an editor with paragraphs, each modelled with text and a collection of markups defined with a start and end index eg bold from char 1 to char 5. I was using this to detect changes to the string so I could shift the markup indices accordingly. But consider the string:

    xx xxxx xx

    The diff function approach can't tell the difference between a character added outside the bold or within it.

    In the end, I took a completely different approach - I just parsed the HTML produced by the editor and used that to determine start and end indices of markups.


    制作我自己的稍微更高性能的版本,其灵感来自与上述相同的策略(寻找从前到后和从后到前的差异)

    function compareText(oldText, newText)
    {
        var difStart,difEndOld,difEndNew;
    
        //from left to right - look up the first index where characters are different
        for(let i=0;i<oldText.length;i++)
        {
            if(oldText.charAt(i) !== newText.charAt(i))
            {
                difStart = i;
                break;
            }
        }
    
        //from right to left - look up the first index where characters are different
        //first calc the last indices for both strings
        var oldMax = oldText.length - 1;
        var newMax = newText.length - 1;
        for(let i=0;i<oldText.length;i++)
        {
            if(oldText.charAt(oldMax-i) !== newText.charAt(newMax-i))
            {
                //with different string lengths, the index will differ for the old and the new text
                difEndOld = oldMax-i;
                difEndNew = newMax-i;
                break;
            }
        }
    
        var removed = oldText.substr(difStart,difEndOld-difStart+1);
        var added = newText.substr(difStart,difEndNew-difStart+1);
    
        return [difStart,added,removed];
    }
    
    链接地址: http://www.djcxy.com/p/82448.html

    上一篇: Python3 urllib.request不会立即关闭连接

    下一篇: 在Javascript中查找字符串之间的区别