How does this forward regex example work?

If forward references are supported, the regex (2two|(one))+ matches oneonetwo.

At the start of the string, 2 fails. Trying the other alternative,

So the fact that the "/2" fails means that the following "two" is skipped?

one is matched by the second capturing group, and subsequently by the first group.

I understand the "second capturing group" but how does it get matched by the "first group"? And also if it gets matched two times why do we get "oneonetwo" instead of "oneoneonetwo" in the final result?

The first group is then repeated. This time, 2 matches one as captured by the second group. two then matches two. With two repetitions of the first group, the regex has matched the whole subject string.

The example is taken from here:

https://www.regular-expressions.info/backref2.html


(2two|(one))+ corresponds to the following instructions:

(    # start recording (for capture buffer 1)
    2   # match the string that is stored in capture buffer 2
    two  # match "two" literally
  |    # or
    (    # start recording (for capture buffer 2)
    one  # match "one" literally
    )    # stop recording; set capture buffer 2
)    # stop recording; set capture buffer 1
+    # repeat the previous thing 1 or more times

Let's say the target string is oneonetwo . What happens next?

We start at offset 0 in the target string and the beginning of the regex.

Logically the first thing to be executed is + ; it's the top level operation in the regex. It tries to match its sub-regex repeatedly (1 or more times).

( starts recording for capture buffer 1, but doesn't really do anything otherwise.

2 tries to match the string from capture buffer 2, but capture buffer 2 is unset. This behaves like a string that never matches, so the whole first alternative fails to match.

| kicks in and we try the second alternative.

( starts recording for capture buffer 2.

We try to match one and succeed: There's a one at offset 0 in the target string. We increment our position in the string (remaining characters: onetwo ) and continue matching.

) stops recording; capture buffer 2 is now set to one .

) stops recording; capture buffer 1 is now set to one .

Our first iteration of the loop was successful. We try to match more (because that's what + does):

( starts recording for capture buffer 1 (again).

2 tries to match the string from capture buffer 2, which is now one . This succeeds because there's a one at the current offset in the target string. We increment our position in the string (remaining characters: two ) and continue matching.

We try to match two and succeed. Our position in the target string is now at the very end.

| sees that the first alternative succeeded; we ignore the other alternative for now.

) stops recording; capture buffer 1 is now set to onetwo .

This concludes the second iteration of the loop. Again we try to match more:

( starts recording for capture buffer 1.

2 tries to match the string from capture buffer 2, which is still one . This fails (there are no characters left in the target string).

| kicks in and we try the second alternative.

( starts recording for capture buffer 2.

We try to match one and fail again (there are no characters left in the target string).

The second alternative fails to match, so the whole subgroup fails (and we throw away the last recording we started for capture buffer 2).

Control returns to + . We have matched two full iterations of the loop (the third one failed). This is fine (two is a perfectly fine instance of "1 or more").

We go on, reaching the end of the regex. This means the whole regex matched successfully. In the end, capture buffer 1 contains onetwo and capture buffer 2 contains one .

Specifically:

oneonetwo
^^^ #1
^^^ #2

^ After the first iteration.

oneonetwo
   ^^^^^^ #1
^^^ #2

^ After the second iteration.

链接地址: http://www.djcxy.com/p/12986.html

上一篇: 如何使用POSIX在C ++中执行命令并获取命令的输出?

下一篇: 这个转发正则表达式示例如何工作?