/
^ # start of string
( # first group start
(?:
(?:[^?+*{}()[\]\\|]+ # literals and ^, $
| \\. # escaped characters
| \[ (?: \^?\\. | \^[^\\] | [^\\^] ) # character classes
(?: [^\]\\]+ | \\. )* \]
| \( (?:\?[:=!]|\?<[=!]|\?>)? (?1)?? \) # parenthesis, with recursive content
| \(\? (?:R|[+-]?\d+) \) # recursive matching
)
(?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )? # quantifiers
| \| # alternative
)* # repeat content
) # end first group
$ # end of string
/
This is a recursive regex, and is not supported by many regex engines. PCRE based ones should support it.
Without whitespace and comments:
/^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/
.NET does not support recursion directly. (The (?1)
and (?R)
constructs.) The recursion would have to be converted to counting balanced groups:
^ # start of string
(?:
(?: [^?+*{}()[\]\\|]+ # literals and ^, $
| \\. # escaped characters
| \[ (?: \^?\\. | \^[^\\] | [^\\^] ) # character classes
(?: [^\]\\]+ | \\. )* \]
| \( (?:\?[:=!]
| \?<[=!]
| \?>
| \?<[^\W\d]\w*>
| \?'[^\W\d]\w*'
)? # opening of group
(?<N>) # increment counter
| \) # closing of group
(?<-N>) # decrement counter
)
(?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )? # quantifiers
| \| # alternative
)* # repeat content
$ # end of string
(?(N)(?!)) # fail if counter is non-zero.
Compacted:
^(?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>|\?<[^\W\d]\w*>|\?'[^\W\d]\w*')?(?<N>)|\)(?<-N>))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*$(?(N)(?!))
From the comments:
Will this validate substitutions and translations?
It will validate just the regex part of substitutions and translations. s/<this part>/.../
It is not theoretically possible to match all valid regex grammars with a regex.
It is possible if the regex engine supports recursion, such as PCRE, but that can't really be called regular expressions any more.
Indeed, a "recursive regular expression" is not a regular expression. But this an often-accepted extension to regex engines... Ironically, this extended regex doesn't match extended regexes.
"In theory, theory and practice are the same. In practice, they're not." Almost everyone who knows regular expressions knows that regular expressions does not support recursion. But PCRE and most other implementations support much more than basic regular expressions.
using this with shell script in the grep command , it shows me some error.. grep: Invalid content of {} . I am making a script that could grep a code base to find all the files that contain regular expressions
This pattern exploits an extension called recursive regular expressions. This is not supported by the POSIX flavor of regex. You could try with the -P switch, to enable the PCRE regex flavor.
Regex itself "is not a regular language and hence cannot be parsed by regular expression..."
This is true for classical regular expressions. Some modern implementations allow recursion, which makes it into a Context Free language, although it is somewhat verbose for this task.
I see where you're matching []()/\
. and other special regex characters. Where are you allowing non-special characters? It seems like this will match ^(?:[\.]+)$
, but not ^abcdefg$
. That's a valid regex.
[^?+*{}()[\]\\|]
will match any single character, not part of any of the other constructs. This includes both literal (a
- z
), and certain special characters (^
, $
, .
).
The fully RFC 822 compliant regex is inefficient and obscure because of its length. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use.
One RFC 5322 compliant regex can be found at the top of the page at http://emailregex.com/ but uses the IP address pattern that is floating around the internet with a bug that allows 00
for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using grep -Po
, including cases domain names, IP addresses, bad ones, and account names with and without quotes.
Correcting the 00
bug in the IP pattern, we obtain a working and fairly fast regex. (Scrape the rendered version, not the markdown, for actual code.)
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
or:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Here is diagram of finite state machine for above regexp which is more clear than regexp itself

The more sophisticated patterns in Perl and PCRE (regex library used e.g. in PHP) can correctly parse RFC 5322 without a hitch. Python and C# can do that too, but they use a different syntax from those first two. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser.
It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. People sign others up to mailing lists this way all the time. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.
Confirmation tokens are the only way to know you got the address of the person entering it. This is why most mailing lists now use that mechanism to confirm sign-ups. After all, anybody can put down president@whitehouse.gov
, and that will even parse as legal, but it isn't likely to be the person at the other end.
For PHP, you should not use the pattern given in Validate an E-Mail Address with PHP, the Right Way from which I quote:
There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard.
That is no better than all the other non-RFC patterns. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. This one, however, is.
If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. This allows for a potentially more pleasing experience, like
The specified e-mail address 'myemail@address,com' is invalid. Did you mean 'myemail@address.com'?
See also Validating Email Addresses, including the comments. Or Comparing E-mail Address Validating Regular Expressions.

Debuggex Demo
Best Solution
The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:
The regex above will match any string, or line without a line break, not containing the (sub)string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible.
And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing
s
in the following pattern):or use it inline:
(where the
/.../
are the regex delimiters, i.e., not part of the pattern)If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class
[\s\S]
:Explanation
A string is just a list of
n
characters. Before, and after each character, there's an empty string. So a list ofn
characters will haven+1
empty strings. Consider the string"ABhedeCD"
:where the
e
's are the empty strings. The regex(?!hede).
looks ahead to see if there's no substring"hede"
to be seen, and if that is the case (so something else is seen), then the.
(dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don't consume any characters. They only assert/validate something.So, in my example, every empty string is first validated to see if there's no
"hede"
up ahead, before a character is consumed by the.
(dot). The regex(?!hede).
will do that only once, so it is wrapped in a group, and repeated zero or more times:((?!hede).)*
. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed:^((?!hede).)*$
As you can see, the input
"ABhedeCD"
will fail because one3
, the regex(?!hede)
fails (there is"hede"
up ahead!).