Regex atomic grouping not greedy enough
I'm trying to build a regex that matches escaped strings like @"hello""world"
. So far I have (ignore whitespace):
@(?=")" #at sign if followed by double quote then double quote
(?> #atomic
""
|
[^"]
)*
"
The problem is that the invalid (because it is not closed) string @"""
matches as @""
. I thought that when I use atomic grouping aka nonbacktracking subexpressions then (?>""|[^"])*
would match the last two double quotes of @"""
(because the left alternative can match two double quotes) and then result in an overall match fail as desired (because the last "
of the regex is not present) but the group seems like it wouldn't be greedy enough (although having greedy quantifier *
and atomic grouping) and still backtrack to the point after the very first "
of the regex as soon as it is noticed that the regex fails. A workaround is to put (?!")
at the end of the regex but I would like to know why it doesn't work with atomic grouping.
Problem
Atomic grouping doesn't match last two quotes! Matching always starts from the left. When a backtracking needed (because of a not-matching token) then last token popped back by default. But ,when using atomic groups whole group popped back instead of only last token of the group. This is used to avoid catastrophic backtrackings .
Solution
Put an end of string at the end:
@(?=")"(?>""|[^"])*"$
链接地址: http://www.djcxy.com/p/12972.html
上一篇: 捕获不在Regex中工作的组
下一篇: 正则表达式原子分组不够贪婪