What is going on behind the scenes when you mark a regular expression as one to be compiled? How does this compare/is different from a cached regular expression?
Using this information, how do you determine when the cost of computation is negligible compared to the performance increase?
Best Solution
RegexOptions.Compiled
instructs the regular expression engine to compile the regular expression expression into IL using lightweight code generation (LCG). This compilation happens during the construction of the object and heavily slows it down. In turn, matches using the regular expression are faster.If you do not specify this flag, your regular expression is considered "interpreted".
Take this example:
It performs 4 tests on 3 different regular expressions. First it tests a single once off match (compiled vs non compiled). Second it tests repeat matches that reuse the same regular expression.
The results on my machine (compiled in release, no debugger attached)
1000 single matches (construct Regex, Match and dispose)
1,000,000 matches - reusing the Regex object
These results show that compiled regular expressions can be up to 60% faster for cases where you reuse the
Regex
object. However in some cases can be over 3 orders of magnitude slower to construct.It also shows that the x64 version of .NET can be 5 to 6 times slower when it comes to compilation of regular expressions.
The recommendation would be to use the compiled version in cases where either
Spanner in the works, the Regex cache
The regular expression engine contains an LRU cache which holds the last 15 regular expressions that were tested using the static methods on the
Regex
class.For example:
Regex.Replace
,Regex.Match
etc.. all use the Regex cache.The size of the cache can be increased by setting
Regex.CacheSize
. It accepts changes in size any time during your application's life cycle.New regular expressions are only cached by the static helpers on the Regex class. If you construct your objects the cache is checked (for reuse and bumped), however, the regular expression you construct is not appended to the cache.
This cache is a trivial LRU cache, it is implemented using a simple double linked list. If you happen to increase it to 5000, and use 5000 different calls on the static helpers, every regular expression construction will crawl the 5000 entries to see if it has previously been cached. There is a lock around the check, so the check can decrease parallelism and introduce thread blocking.
The number is set quite low to protect yourself from cases like this, though in some cases you may have no choice but to increase it.
My strong recommendation would be never pass the
RegexOptions.Compiled
option to a static helper.For example:
The reason being that you are heavily risking a miss on the LRU cache which will trigger a super expensive compile. Additionally, you have no idea what the libraries you depend on are doing, so have little ability to control or predict the best possible size of the cache.
See also: BCL team blog
Note : this is relevant for .NET 2.0 and .NET 4.0. There are some expected changes in 4.5 that may cause this to be revised.