Better option... just strip all non-digit characters on input (except 'x' and leading '+' signs), taking care because of the British tendency to write numbers in the non-standard form
+44 (0) ... when asked to use the international prefix (in that specific case, you should discard the
Then, you end up with values like:
Then when you display, reformat to your hearts content. e.g.
1 (234) 567-8901
1 (234) 567-8901 x1234
The fully RFC 822 compliant regex is inefficient and obscure because of its length. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use.
One RFC 5322 compliant regex can be found at the top of the page at http://emailregex.com/ but uses the IP address pattern that is floating around the internet with a bug that allows
00 for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using
grep -Po, including cases domain names, IP addresses, bad ones, and account names with and without quotes.
00 bug in the IP pattern, we obtain a working and fairly fast regex. (Scrape the rendered version, not the markdown, for actual code.)
Here is diagram of finite state machine for above regexp which is more clear than regexp itself
The more sophisticated patterns in Perl and PCRE (regex library used e.g. in PHP) can correctly parse RFC 5322 without a hitch. Python and C# can do that too, but they use a different syntax from those first two. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser.
It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. People sign others up to mailing lists this way all the time. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.
Confirmation tokens are the only way to know you got the address of the person entering it. This is why most mailing lists now use that mechanism to confirm sign-ups. After all, anybody can put down
email@example.com, and that will even parse as legal, but it isn't likely to be the person at the other end.
For PHP, you should not use the pattern given in Validate an E-Mail Address with PHP, the Right Way from which I quote:
There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard.
That is no better than all the other non-RFC patterns. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. This one, however, is.
If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. This allows for a potentially more pleasing experience, like
The specified e-mail address 'myemail@address,com' is invalid. Did you mean 'firstname.lastname@example.org'?
See also Validating Email Addresses, including the comments. Or Comparing E-mail Address Validating Regular Expressions.
James is right that you want to use the Regex.Replace method that takes a MatchEvaluator argument. The match evaluator delegate is where you can take the numeric string you get in the match and convert it into a number that you can add 10,000 to. I used a lambda expression in place of the explicit delegate because its more compact and readable.