Regex – Sed regex and substring negation


What is the correct syntax for finding a substring (a string which is preceded and followed by specific strings) which does not match a specific pattern?

For example, I want to take all substrings which start with BEGIN_, end with _END and the substring in between is not equal to FOO; and replace the whole substring with the format "(inner substring)". The following would match:

  • BEGIN_bar_END -> (bar)
  • BEGIN_buz_END -> (buz)
  • BEGIN_ihfd8f398IHFf9f39_END -> (ihfd8f398IHFf9f39)

But BEGIN_FOO_END would not match.

I have played around with the following, but cannot seem to find the correct syntax:

sed -e 's/BEGIN_(^FOO)_END/($1)/g'
sed -e 's/BEGIN_([^FOO])_END/($1)/g'
sed -e 's/BEGIN_(?!FOO)_END/($1)/g'
sed -e 's/BEGIN_(!FOO)_END/($1)/g'
sed -e 's/BEGIN_(FOO)!_END/($1)/g'
sed -e 's/BEGIN_!(FOO)_END/($1)/g'

Best Solution

There is no general negation operator in sed, IIRC because compilation of regexes with negation to DFAs takes exponential time. You can work around this with

'/BEGIN_FOO_END/b; s/BEGIN_\(.*\)_END/(\1)/g'

where /BEGIN_FOO_END/b means: if we find BEGIN_FOO_END, then branch (jump) to the end of the sed script.