[oclug] A Friday Regex Head Scratcher

Jon Earle je_oclug at kronos.honk.org
Mon Apr 20 14:15:02 EDT 2009



Dave O'Neill wrote:
> yanick at babyl.dyndns.org wrote:
>
>> $string =~ s[ (?<=PR\s)      # preceeded by 'PR '
>>               \s*\d+\s*(,\s*\d+\s*)*
>>             ][
>>                 join( ', ',
>>                      map  {
>>                         s/^\s+|\s+$//g;               # trim whitespaces
>>                         "<a href='http://$_'>$_</a>"; # url-ify
>>                      }
>>                      split ',' => $&
>>                 )
>>                 . ' '     # add a whitespace after the list
>>             ]xeg;
>
> Using $& is generally a bad idea, because using it once imposes a
> performance penalty on all other regular expression matches.  With a
> slight modification, you can avoid the performance hit by placing the
> capturing parens around the entire list of digits, and using
> non-capturing parens for the internal grouping:

Hey Dave,

I was just reading up on the $& stuff, and according to Programming Perl, 3rd
Ed, pg 146, last para:

"Perl uses a similar mechanism to produce $1, $2 and so on, so you also pay a
price for each pattern that contains capturing parentheses."

In my case, the matches are to be done to generate a web page and the results
are plenty fast enough (fast enough that I can't detect a page loading
delay).

It would be interesting to see if there are in fact, performance differences
between $& and $1 usage though, but I don't have the time to run those tests
at the moment.

I ended up studying the heck out of Yanick's example and learned quite a bit
about lookbehinds, the map function and advanced regex usage.  The perl code
adapted easily and is working (found out that you cannot do variable length
lookbehinds but I fixed that by using two lookbehinds |'d together).

The python solution was a bit more complex, but made use of the (match
portion of the) perl regex without modification.  The substition part of it
went together in much the same fashion as the perl code, but making more use
of native python functions (at least, the ones I could find by stabbing
wildly in the dark and googling heavily).

Thanks Dave and Yanick for your advice and suggestions!

Cheers!
Jon



More information about the OCLUG mailing list