Regular expression to avoid a given set of substrings

This question already has an answer here:

  • Regular expression to match a line that doesn't contain a word? 25 answers

  • That's what you use a negative lookahead assertion for:

    ^(?!.*(abc|def|ghi))
    

    will match as long as the input string doesn't contain any of the "bad" words.

    Note that the lookahead assertion itself doesn't match anything, so the match result (in the case of a successful match) will be an empty string.

    In Python:

    >>> regex =  re.compile("^(?!.*(abc|def|ghi))")
    >>> [bool(regex.match(s)) for s in ("student", "apple", "maria",
    ...                                 "definition", "ghint", "abc123")]
    [True, True, True, False, False, False]
    

    You can use lookaheads:

    ^(?!.*?(?:abc|def|ghi)).*$
    
  • (?!...) is called negative lookahead
  • (?:...) is called non capturing group.
  • Regex Reference


    If you have a string containing the "forbidden" words like below "

    student apple maria definition ghint abc123 righit

    and you just want to know if the string contains them you can use :

    .*?(?!def|abc|ghi)
    

    This will give you 4 matches

  • d
  • g
  • a
  • g
  • that are the first letters of the forbidden words ( *def*inition, *ghi*nt, *abc*123, ri*ghi*t )

    If no matches are found in your string, there are no "forbidden" words.

    you can also use a regex.replace using :

    w*(abc|def|ghi)w*
    

    that replaces your "forbidden" substring with "" allowing you to retain all non-forbidden substrings.

    链接地址: http://www.djcxy.com/p/13390.html

    上一篇: 什么是最好的正则表达式来检查一个字符串是否是一个有效的URL?

    下一篇: 正则表达式以避免给定的一组子字符串