C# – How to get a regex match to only be added once to the matches collection

c++regex

I have a string which has several html comments in it. I need to count the unique matches of an expression.

For example, the string might be:

var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";

I currently use this to get the matches:

var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);

The results of this is 3 matches. However, I would like to have this be only 2 matches since there are only two unique matches.

I know I can probably loop through the resulting MatchCollection and remove the extra Match, but I'm hoping there is a more elegant solution.

Clarification: The sample string is greatly simplified from what is actually being used. There can easily be an X8 or X9, and there are likely dozens of each in the string.

Best Solution

I would just use the Enumerable.Distinct Method for example like this:

string subjectString = "<!--X1-->Hi<!--X1-->there<!--X2--><!--X1-->Hi<!--X1-->there<!--X2-->";
var regex = new Regex(@"<!--X\d-->");
var matches = regex.Matches(subjectString);
var uniqueMatches = matches
    .OfType<Match>()
    .Select(m => m.Value)
    .Distinct();

uniqueMatches.ToList().ForEach(Console.WriteLine);

Outputs this:

<!--X1-->  
<!--X2-->

For regular expression, you could maybe use this one?

(<!--X\d-->)(?!.*\1.*)

Seems to work on your test string in RegexBuddy at least =)

// (<!--X\d-->)(?!.*\1.*)
// 
// Options: dot matches newline
// 
// Match the regular expression below and capture its match into backreference number 1 «(<!--X\d-->)»
//    Match the characters “<!--X” literally «<!--X»
//    Match a single digit 0..9 «\d»
//    Match the characters “-->” literally «-->»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1.*)»
//    Match any single character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
//    Match the same text as most recently matched by capturing group number 1 «\1»
//    Match any single character «.*»
//       Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»