Regex lookahead, lookbehind and atomic groups

I found these things in my regex body but I haven't got a clue what I can use them for. Does somebody have examples so I can try to understand how they work?

(?!) - negative lookahead
(?=) - positive lookahead
(?<=) - positive lookbehind
(?<!) - negative lookbehind

(?>) - atomic group

Examples

Given the string foobarbarfoo :

bar(?=bar)     finds the 1st bar ("bar" which has "bar" after it)
bar(?!bar)     finds the 2nd bar ("bar" which does not have "bar" after it)
(?<=foo)bar    finds the 1st bar ("bar" which has "foo" before it)
(?<!foo)bar    finds the 2nd bar ("bar" which does not have "foo" before it)

You can also combine them:

(?<=foo)bar(?=bar)    finds the 1st bar ("bar" with "foo" before it and "bar" after it)

Definitions

Look ahead positive (?=)

Find expression A where expression B follows:

A(?=B)

Look ahead negative (?!)

Find expression A where expression B does not follow:

A(?!B)

Look behind positive (?<=)

Find expression A where expression B precedes:

(?<=B)A

Look behind negative (?<!)

Find expression A where expression B does not precede:

(?<!B)A

Atomic groups (?>)

An atomic group is a non-capturing group that exits the group and throws away all alternatives after the first match of the pattern inside the group, so backtracking is disabled.

A non-atomic group will allow backtracking, it will still find the first match, then if the matching ahead fails it will backtrack and find the next match, until a match for the entire expression is found or all possibilities are exhausted.

  • A non-atomic group in the expression (foo|foot)s applied to foots will:

  • match its 1st alternative foo , then fail as s does not immediately follow in foots , and backtrack to its 2nd alternative;
  • match its 2nd alternative foot , then succeed as s immediately follows in foots , and stop.
  • An atomic group in the expression (?>foo|foot)s applied to foots will match its 1st alternative foo , then fail as s does not immediately follow, and stop as backtracking is disabled.

  • Some resources

  • http://www.regular-expressions.info/lookaround.html
  • http://www.rexegg.com/regex-lookarounds.html

  • Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. They don't consume any character - the matching for regex following them (if any), will start at the same cursor position.

    Read regular-expression.info for more details.

  • Positive lookahead:
  • Syntax:

    (?=REGEX_1)REGEX_2
    

    Match only if REGEX_1 matches; after matching REGEX_1, the match is discarded and searching for REGEX_2 starts at the same position.

    example:

    (?=[a-z0-9]{4}$)[a-z]{1,2}[0-9]{2,3}
    

    REGEX_1 is [a-z0-9]{4}$ which matches four alphanumeric chars followed by end of line.
    REGEX_2 is [az]{1,2}[0-9]{2,3} which matches one or two letters followed by two or three digits.

    REGEX_1 makes sure that the length of string is indeed 4, but doesn't consume any characters so that search for REGEX_2 starts at the same location. Now REGEX_2 makes sure that the string matches some other rules. Without look-ahead it would match strings of length three or five.

  • Negative lookahead
  • Syntax:

    (?!REGEX_1)REGEX_2
    

    Match only if REGEX_1 does not match; after checking REGEX_1, the search for REGEX_2 starts at the same position.

    example:

    (?!.*bFWORDb)w{10,30}$
    

    The look-ahead part checks for the FWORD in the string and fails if it finds it. If it doesn't find FWORD , the look-ahead succeeds and the following part verifies that the string's length is between 10 and 30 and that it contains only word characters a-zA-Z0-9_

    Look-behind is similar to look-ahead: it just looks behind the current cursor position. Some regex flavors like javascript doesn't support look-behind assertions. And most flavors that support it (PHP, Python etc) require that look-behind portion to have a fixed length.

  • Atomic groups basically discards/forgets the subsequent tokens in the group once a token matches. Check this page for examples of atomic groups

  • Grokking lookaround rapidly.
    How to distinguish lookahead and lookbehind? Take 2 minutes tour with me:

    (?=) - positive lookahead
    (?<=) - positive lookbehind
    

    Suppose

        A  B  C #in a line
    

    Now, we ask B, Where are you?
    B has two solutions to declare it location:

    One, B has A ahead and has C bebind
    Two, B is ahead(lookahead) of C and behind (lookhehind) A.

    As we can see, the behind and ahead are opposite in the two solutions.
    Regex is solution Two.

    链接地址: http://www.djcxy.com/p/74792.html

    上一篇: 与此DFA对应的正则表达式是什么?

    下一篇: 正则表达式lookahead,lookbehind和原子组