How are text editors generally implemented


This question is probably going to make me sound pretty clueless. That's because I am.

I'm just thinking, if I were hypothetically interested in designing my own text editor GUI control, widget, or whatever you want to call it (which I'm not), how would I even do it?

The temptation to a novice such as myself would be to store the content of the text editor in the form of a string, which seems quite costly (not that I'm too familiar with how string implementations differ between one language/platform and the next; but I know that in .NET, for example, they're immutable, so frequent manipulation such as what you'd need to support in a text editor would be magnificently wasteful, constructing one string instance after another in very rapid succession).

Presumably some mutable data structure containing text is used instead; but figuring out what this structure might look like strikes me as a bit of a challenge. Random access would be good (I would think, anyway—after all, don't you want the user to be able to jump around to anywhere in the text?), but then I wonder about the cost of, say, navigating to somewhere in the middle of a huge document and starting to type immediately. Again, the novice approach (say you store the text as a resizeable array of characters) would lead to very poor performance, I'm thinking, as with every character typed by the user there would be a huge amount of data to "shift" over.

So if I had to make a guess, I'd suppose that text editors employ some sort of structure that breaks the text down into smaller pieces (lines, maybe?), which individually comprise character arrays with random access, and which are themselves randomly accessible as discrete chunks. Even that seems like it must be a rather monstrous oversimplification, though, if it is even remotely close to begin with.

Of course I also realize that there may not be a "standard" way that text editors are implemented; maybe it varies dramatically from one editor to another. But I figured, since it's clearly a problem that's been tackled many, many times, perhaps a relatively common approach has surfaced over the years.

Anyway, I'm just interested to know if anyone out there has some knowledge on this topic. Like I said, I'm definitely not looking to write my own text editor; I'm just curious.

Best Solution

One technique that's common (especially in older editors) is called a split buffer. Basically, you "break" the text into everything before the cursor and everything after the cursor. Everything before goes at the beginning of the buffer. Everything after goes at the end of the buffer.

When the user types in text, it goes into the empty space in between without moving any data. When the user moves the cursor, you move the appropriate amount of text from one side of the "break" to the other. Typically there's a lot of moving around a single area, so you're usually only moving small amounts of text at a time. The biggest exception is if you have a "go to line xxx" kind of capability.

Charles Crowley has written a much more complete discussion of the topic. You might also want to look at The Craft of Text Editing, which covers split buffers (and other possibilities) in much greater depth.