Regular expression to avoid a given set of substrings

2018-06-04 01:06:37

This question already has an answer here:

Regular expression to match a line that doesn't contain a word? 25 answers

That's what you use a negative lookahead assertion for:

^(?!.*(abc|def|ghi))

will match as long as the input string doesn't contain any of the "bad" words.

Note that the lookahead assertion itself doesn't match anything, so the match result (in the case of a successful match) will be an empty string.

In Python:

>>> regex =  re.compile("^(?!.*(abc|def|ghi))")
>>> [bool(regex.match(s)) for s in ("student", "apple", "maria",
...                                 "definition", "ghint", "abc123")]
[True, True, True, False, False, False]

You can use lookaheads:

^(?!.*?(?:abc|def|ghi)).*$

(?!...) is called negative lookahead

(?:...) is called non capturing group.

Regex Reference

If you have a string containing the "forbidden" words like below "

student apple maria definition ghint abc123 righit

and you just want to know if the string contains them you can use :

.*?(?!def|abc|ghi)

This will give you 4 matches

that are the first letters of the forbidden words ( *def*inition, *ghi*nt, *abc*123, ri*ghi*t )

If no matches are found in your string, there are no "forbidden" words.

you can also use a regex.replace using :

w*(abc|def|ghi)w*

that replaces your "forbidden" substring with "" allowing you to retain all non-forbidden substrings.

链接地址: http://www.djcxy.com/p/13390.html

上一篇: 什么是最好的正则表达式来检查一个字符串是否是一个有效的URL？

下一篇: 正则表达式以避免给定的一组子字符串