C# – How to convert Html to plain text

asp.netc++html

I have snippets of Html stored in a table. Not entire pages, no tags or the like, just basic formatting.

I would like to be able to display that Html as text only, no formatting, on a given page (actually just the first 30 – 50 characters but that's the easy bit).

How do I place the "text" within that Html into a string as straight text?

So this piece of code.

<b>Hello World.</b><br/><p><i>Is there anyone out there?</i><p>

Becomes:

Hello World. Is there anyone out there?

Best Solution

The MIT licensed HtmlAgilityPack has in one of its samples a method that converts from HTML to plain text.

var plainText = HtmlUtilities.ConvertToPlainText(string html);

Feed it an HTML string like

<b>hello, <i>world!</i></b>

And you'll get a plain text result like:

hello world!