[oclug] A Friday Regex Head Scratcher

yanick at babyl.dyndns.org yanick at babyl.dyndns.org
Fri Apr 17 17:40:37 EDT 2009


On Fri, Apr 17, 2009 at 02:38:04PM -0400, Jon Earle wrote:
> Probably simple, but the answer is eluding me at the moment.  I have a block
> of text that contains:
> 
> some text PR 1234 more text
> 
> and I want to convert the number to a link to the issue in the bug reporting
> database.  This is no problem, there is a regex to do just that:
> 
>              (\b(PR|SCR)[:s#]?\s?) # PR or SCR followed by :|s|#|whitespace
>              (\s[a-z0-9-]+\/)?     # a category name & a slash (optional)
>              ([0-9]+)              # the PR number
> 
> and that works perfectly.  Now, suppose that the block of text contains:
> 
> some text PR 1234, 5678 ,2345 more text
> 
> How would I need to adjust the regex to account for the unknown number of
> additional comma-number sequences so that they can each be htmlified?
 
A way of doing it in Perl would be:

=begin code


my $string = "some text PR 1234, 5678 ,2345 more text";

$string =~ s[ (?<=PR\s)      # preceeded by 'PR '
              \s*\d+\s*(,\s*\d+\s*)*
            ][
                join( ', ',
                     map  {
                        s/^\s+|\s+$//g;               # trim whitespaces
                        "<a href='http://$_'>$_</a>"; # url-ify
                     }
                     split ',' => $&
                )
                . ' '     # add a whitespace after the list
            ]xeg;

print $string;

=end code

Basically, I captured the '1234, 5678, 2345' part of the string,
which I splitted on the commas, url-ified independently and then
glued back together.  

Joy,
`/anick


More information about the OCLUG mailing list