A Regex that will never be matched by anything

This might sound like a stupid question, but I had a long talk with some of my fellow developers and it sounded like a fun thing to think of.

So; what's your thought - what does a Regex look like, that will never be matched by any string, ever!

Edit : Why I want this? Well, firstly because I find it interesting to think of such an expression and secondly because I need it for a script.

In that script I define a dictionary as Dictionary<string, Regex> . This contains, as you see, a string and an expression.

Based on that dictionary I create methods that all use this dictionary as only reference on how they should do their work, one of them matches the regexes against a parsed logfile.

If an expression is matched, another Dictionary<string, long> is added a value that is returned by the expression. So, to catch any log-messages that are not matched by an expression in the dictionary I created a new group called "unknown".

To this group everything that didn't match anything other is added. But to prevent the "unknown"-expression to mismatch (by accident) a log-message, I had to create an expression that is most certainly never matched, no matter what string I give it.

Thus, there you have my reason for this "not a real question"...


This is actually quite simple, although it depends on the implementation / flags*:

$a

Will match a character a after the end of the string. Good luck.

WARNING:
This expression is expensive -- it will scan the entire line, find the end-of-line anchor, and only then not find the a and return a negative match. (See comment below for more detail.)


* Originally I did not give much thought on multiline-mode regexp, where $ also matches the end of a line. In fact, it would match the empty string right before the newline, so an ordinary character like a can never appear after $ .


Leverage negative lookahead :

>>> import re
>>> x=r'(?!x)x'
>>> r=re.compile(x)
>>> r.match('')
>>> r.match('x')
>>> r.match('y')

this RE is a contradiction in terms and therefore will never match anything.

NOTE:
In Python, re.match() implicitly adds a beginning-of-string anchor ( A ) to the start of the regular expression. This anchor is important for performance: without it, the entire string will be scanned. Those not using Python will want to add the anchor explicitly:

A(?!x)x

look around:

(?=a)b

For regex newbies: The positive look ahead (?=a) makes sure that the next character is a , but doesn't change the search location (or include the 'a' in the matched string). Now that next character is confirmed to be a , the remaining part of the regex ( b ) matches only if the next character is b . Thus, this regex matches only if a character is both a and b at the same time.

链接地址: http://www.djcxy.com/p/76706.html

上一篇: 正则表达式匹配一个字符串,它以开头的相同顺序结束

下一篇: 一个永远不会被任何东西匹配的正则表达式