Regex – better way to write Perl regexes with /x so the code is still easy to read


I ran Perl::Critic on one of my scripts, and got this message:

Regular expression without "/x" flag at line 21, column 26. See page 236 of PBP.

I looked up the policy information here, and I understand that writing regular expressions in extended mode will help anyone who is looking at the code.

However, I am stuck as how to convert my code to use the /x flag.

CPAN Example:

# Match a single-quoted string efficiently...

m{'[^\\']*(?:\\.[^\\']*)*'};  #Huh?

# Same thing with extended format...

    '           # an opening single quote
    [^\\']      # any non-special chars (i.e. not backslash or single quote)
    (?:         # then all of...
        \\ .    #    any explicitly backslashed char
        [^\\']* #    followed by an non-special chars
    )*          # ...repeated zero or more times
    '           # a closing single quote

This makes sense if you only look at the regex.

My Code:

if ($line =~ /^\s*package\s+(\S+);/ ) {

I am not exactly sure how to use an extended regex inside of an if statement. I can write it like this:

    if (
        $line =~ /
        ^\s*    # starting with zero or more spaces
        \s+     # at least one space
        (\S+)   # capture any non-space characters
        ;       # ending in a semi-colon

And this works, but I think this is almost harder to read than the original. Is there a better way (or a best practice way) to write this? I guess I could create a variable using qr//.

I'm not really looking for advice on re-writing this specific regex (although if I can improve it, I'll take advice) – I'm more looking for advice on how to expand a regex inside of an if statement.

I know Perl::Critic is just a guideline, but it would be nice to follow it.

Thanks in advance!

So after receiving a few answers, it became clear to me that making a regex multi-line with comments is not always necessary. People who understand basic regex should be able to understand what my example was doing – the comments I added were maybe a little unnecessary and verbose. I like the idea of using the extended regex flag, but still embedding spaces in the regex to make each part of the regex a little more clear.
Thanks for all the input!

Best Solution

Never write a comment that says what the code says. Comments should tell you why the code says what it says. Take a look at this monstrosity, without the comments it is very difficult to see what is going on, but the comments make it clear what is trying to be matched:

require 5.010;
my $sep         = qr{ [/.-] }x;               #allowed separators    
my $any_century = qr/ 1[6-9] | [2-9][0-9] /x; #match the century 
my $any_decade  = qr/ [0-9]{2} /x;            #match any decade or 2 digit year
my $any_year    = qr/ $any_century? $any_decade /x; #match a 2 or 4 digit year

#match the 1st through 28th for any month of any year
my $start_of_month = qr/
    (?:                         #match
        0?[1-9] |               #Jan - Sep or
        1[0-2]                  #Oct - Dec
    ($sep)                      #the separator
        0?[1-9] |               # 1st -  9th or
        1[0-9]  |               #10th - 19th or
        2[0-8]                  #20th - 28th
    \g{-1}                      #and the separator again

#match 28th - 31st for any month but Feb for any year
my $end_of_month = qr/
        (?: 0?[13578] | 1[02] ) #match Jan, Mar, May, Jul, Aug, Oct, Dec
        ($sep)                  #the separator
        31                      #the 31st
        \g{-1}                  #and the separator again
        |                       #or
        (?: 0?[13-9] | 1[0-2] ) #match all months but Feb
        ($sep)                  #the separator
        (?:29|30)               #the 29th or the 30th
        \g{-1}                  #and the separator again

#match any non-leap year date and the first part of Feb in leap years
my $non_leap_year = qr/ (?: $start_of_month | $end_of_month ) $any_year/x;

#match 29th of Feb in leap years
#BUG: 00 is treated as a non leap year
#even though 2000, 2400, etc are leap years
my $feb_in_leap = qr/
    0?2                         #match Feb
    ($sep)                      #the separtor
    29                          #the 29th
    \g{-1}                      #the separator again
        $any_century?           #any century
        (?:                     #and decades divisible by 4 but not 100
            0[48]       | 
            [2468][048] |
        (?:                     #or match centuries that are divisible by 4
            16          | 
            [2468][048] |

my $any_date  = qr/$non_leap_year|$feb_in_leap/;
my $only_date = qr/^$any_date$/;