Why is this regex not greedy?
In this regex
$line = 'this is a regular expression';
$line =~ s/^(w+)b(.*)b(w+)$/$3 $2 $1/;
print $line;
Why is $2 equal to " is a regular "
? My thought process is that (.*) should be greedy and match all characters until the end of the line and therefore $3 would be empty.
That's not happening, though. The regex matcher is somehow stopping right before the last word boundary and populating $3 with what's after the last word boundary and the rest of the string is sent to $2.
Any explanation? Thanks.
$3
can't be empty when using this regex because the corresponding capturing group is (w+)
, which must match at least one word character or the whole match will fail.
So what happens is (.*)
matches " is a regular expression
", b
matches the end of the string, and (w+)
fails to match. The regex engine then backtracks to (.*)
matching " is a regular "
(note the match includes the space), b
matches the word boundary before e
, and (w+)
matches " expression
".
If you change (w+)
to (w*)
then you will end up with the result you expected, where (.*)
consumes the whole string.
Greedy doesn't mean it gets to match absolutely everything. It just means it can take as much as possible and still have the regex succeed .
This means that since you use the +
in group 3 it can't be empty and still succeed as +
means 1 or more .
If you want 3 to be empty, just change (w+)
to (w?)
. Now since ?
means 0 or 1 it can be empty, and therefore the greedy .*
takes everything. Note: This seems to work only in Perl, due to how perl deals with lines.
In order for the regex to match the whole string, ^(w+)b
requires that the entire first word be 1
. Likewise, b(w+)$
requires that the entire last word be 3
. Therefore, no matter how greedy (.*)
is, it can only capture ' is a regular ', otherwise the pattern won't match. At some point while matching the string, .*
probably did take up the entire ' is a regular expression', but then it found that it had to backtrack and let the w+
get its match too.
上一篇: 贪婪与非
下一篇: 为什么这个正则表达式不贪婪?