match and lazy quantifier with strange behavior
I know that:
Lazy quantifier matches: As Few As Possible (shortest match)
Also know that the constructor:
basic_regex( ...,
flag_type f = std::regex_constants::ECMAScript );
And:
ECMAScript
supports non-greedy matches,
and the ECMAScript
regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ... en.cppreference
And:
At most one grammar option must be chosen out of ECMAScript
, basic
, extended
, awk
, grep
, egrep
. If no grammar is chosen, ECMAScript
is assumed to be selected ... en.cppreference
And:
Note that regex_match
will only successfully match a regular expression to an entire character sequence, whereas std::regex_search
will successfully match subsequences...std::regex_match
Here is my code: + Live
#include <iostream>
#include <string>
#include <regex>
int main(){
std::string string( "s/one/two/three/four/five/six/g" );
std::match_results< std::string::const_iterator > match;
std::basic_regex< char > regex ( "s?/.+?/g?" ); // non-greedy
bool test = false;
using namespace std::regex_constants;
// okay recognize the lazy operator .+?
test = std::regex_search( string, match, regex );
std::cout << test << 'n';
std::cout << match.str() << 'n';
// does not recognize the lazy operator .+?
test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
std::cout << test << 'n';
std::cout << match.str() << 'n';
}
and the output:
1
s/one/
1
s/one/two/three/four/five/six/g
Process returned 0 (0x0) execution time : 0.008 s
Press ENTER to continue.
std::regex_match
should not match anything and it should return 0
with non-greedy quantifier .+?
In fact, here, the non-greedy .+?
quantifier has the same meaning as greedy one, and both /.+?/
and /.+/
match the same string. They are different patterns. So the problem is why the question mark is ignored?
regex101
Fast test:
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?/.+?/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?/.+/g?/ && print $&'
$ s/one/two/three/four/five/six/g
NOTE
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" );
non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" );
greedy
have the same output with std::regex_match
. Still both match the entire of the string!
But with std::regex_search
have the different output.
Also s?
or g?
does not matter and with /.*?/
still matches the entire of the string!
More Detail
g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901
I don't see any inconsistency. regex_match
tries to match the whole string, so s?/.+?/g?
lazily expands till the whole string is covered.
These "diagrams" (for regex_search
) will hopefully help to get the idea of greediness:
Non-greedy:
a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".
Greedy:
a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa| # try .* == "baba" first
# backtrack
a.*|a: abab|a # try .* == "bab" now
a.*a|: ababa|
And regex_match( abc )
is like regex_search( ^abc$ )
in this case.
上一篇: 在Ruby 1.9.3中没有实现的所有通用量词{m,n} +?
下一篇: 匹配和具有奇怪行为的懒惰量词