C# regex to find and replace reusing part of the matched text

cregex

I need to do a search and replace on long text strings. I want to find all instances of broken links that look like this:

<a href="http://any.url.here/%7BlocalLink:1369%7D%7C%7CThank%20you%20for%20registering">broken link</a>

and fix it so that it looks like this:

<a href="/{localLink:1369}" title="Thank you for registering">link</a>

There may be a number of these broken links in the text field. My difficulty is working out how to reuse the matched ID (in this case 1369). In the content this ID changes from link to link, as does the url and the link text.

Thanks,

David

EDIT: To clarify, I am writing C# code to run through hundreds of long text fields to fix broken links in them. Each single text field contains html that can have any number of broken links in there – the regex needs to find them all and replace them with the correct version of the link.

Best Answer

Take this with a grain of salt, HTML and Regex don't play well together:

(<a\s+[^>]*href=")[^"%]*%7B(localLink:\d+)%7D%7C%7C([^"]*)("[^>]*>[^<]*</a>)

When applied to your input and replaced with

$1/{$2}" title="$3$4

the following is produced:

<a href="/{localLink:1369}" title="Thank%20you%20for%20registering">broken link</a>

This is as close as it gets with regex alone. You'll need to use a MatchEvaluator delegate to remove the URL encoding from the replacement.