Repetition of group patterns in a regex pattern


So, folks, I have this self crafted pattern that works. After some hours (I am no regex guru) this puppy evolved to parse curl PUT output for me:

   ^\s*([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)
    \s+([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)

(CR in text only for formatting)

It gives me 'groups' that I access–it works! Yet the coder in me sees the repetition of a pattern, and it bugs the frack out of me. I've seen perl how-small-is-your-pattern contests over the years that makes me think this could be much smaller. But my attempts to slap a * in it have failed miserably.

So, The Question Is: how do write this pattern in a more concise way so that I can still pull out my target groups?

It probably doesn't matter, but here are the groups I am after:

$1: percent finished
$2: size uploaded so far
$6: size to upload
$8: average upload rate 

Update: Further background can by found on a blog post of mine (How to configure OnMyCommand to generate a progress bar for curl) that will explain what I am doing and why I am after only a regex pattern. I'm not actually coding in a language, per se…but configuring a tool to use a regex.

Best Solution

It looks like this is the best I can do:

^\s*([^ ]+)\s+([^ ]+)\s+(?:[^ ]+\s+){3}([^ ]+)\s+[^ ]+\s+([^ ]+)\s+

I collapsed the matches you do not care about, made them not capture, and left off the unneeded trailing matches. If it is important to match everything (e.g. there are other lines that would match this) you can say:

^\s*([^ ]+)\s+([^ ]+)\s+(?:[^ ]+\s+){3}([^ ]+)\s+[^ ]+\s+([^ ]+)(?:\s+[^ ]){4}

Note, my changes also change the capture numbers:

  • $1: percent finished
  • $2: size uploaded so far
  • $3: size to upload
  • $4: average upload rate

You may be able to get away with this if it supports \S


but it does not mean exactly the same thing.