WPF RichTextBox TextChanged event – how to find deleted or inserted text


While creating a customized editor with RichTextBox, I've face the problem of finding deleted/inserted text with the provided information with TextChanged event.

The instance of TextChangedEventArgs has some useful data, but I guess it does not cover all the needs. Suppose a scenario which multiple paragraphs are inserted, and at the same time, the selected text (which itself spanned multiple paragraphs) has been deleted.

With the instance of TextChangedEventArgs, you have a collection of text changes, and each change only provides you with the number of removed or added symbols and the position of it.

The only solution I have in mind is, to keep a copy of document, and apply the given list of changes on it. But as the instances of TextChange only give us the number of inserted/removed symbols (and not the symbols), so we need to put some special symbol (for example, '?') to denote unknown symbols while we transform our original copy of document.

After applying all changes to the original copy of document, we can then compare it with the richtextbox's updated document and find the mappings between unknown symbols and the real ones. And finally, get what we want !!!

Anybody has tried this before? I need your suggestions on the whole strategy, and what you think about this approach.


Best Solution

It primarily depends on your use of the text changes. When the sequence includes both inserts and deletes it is theoretically impossible to know the details of each insert, since some of the symbols inserted may have subsequently been deleted. Therefore you have to choose what results you really want:

  • For some purposes you must to know the exact sequence of changes even if some of the inserted symbols must be left as "?".
  • For other purposes you must know exactly how the new text differs from the old but not the exact sequence in which the changes were made.

I will techniques to achieve each of these results. I have used both techniques in the past, so I know they are effective.

To get the exact sequence

This is more appropriate if you are implementing a history or undo log or searching for specific actions.

For these uses, the process you describe is probably best, with one possible change: Instead of "finding the mappings between the unknown symbols and the real ones", simply run the scan forward to find the text of each "Delete" then run it backward to find the text of each "Insert".

In other words:

  1. Start with the initial text and process the changes in order. For each insert, insert '?' symbols. For each delete, remove the specified number of symbols and record them as the text deleted.

  2. Start with the final text and process the changes in reverse order. For each delete, insert '?' symbols. For each insert, remove the specified number of symbols and record them as the text inserted.

When this is complete, all of your "Insert" and "Delete" change entries will have the associated text to the best of our knowledge, and any text that was inserted and immediately deleted will be '?' symbols.

To get the difference

This is more appropriate for revision marking or version comparison.

For these uses, simply use the text change information to compute a set of integer ranges in which changes might be found, then use a standard diff algorithm to find the actual changes. This tends to be very efficient in processing incremental changes but still gives you the best updates.

This is particularly nice when you paste in a replacement paragraph that is almost identical to the original: Using the text change information will indicate the whole paragraph is new, but using diff (ie. this technique) will mark only those symbol runs that are actually different.

The code for computing the change range is simple: Represent the change as four integers (oldstart, oldend, newstart, newend). Run through each change:

  1. If changestart is before newstart, reduce newstart to changestart and reduce oldstart an equal amount
  2. If changeend is after newend, increase newend to changeend and increase oldend an equal amount

Once this is done, extract range [oldstart, oldend] from the old document and the range [newstart, newend] from the new document, then use the standard diff algorithm to compare them.

Related Question