Help with regex include and exclude

2018-06-27 10:27:15

I would like some help with regex.

I'm trying to create an expression that will include certain strings and exclude certain strings.

For example:

I would like to include any URL containing mobility http://www.something.com/mobility/

However I would like to exclude any URL containing store http://www.something.com/store/mobility/

FYI I have many keywords that I'm using to include. Currently I am including like this /mobility|enterprise|products/i however I am not finding it able to exclude links that contain other keywords.

Thank you in advance for any help and insight you can provide.

It's possible to do all this in one regex, but you don't really need to. I think you'll have a better time if you run two separate tests: one for your include rules and one for your exclude rules. Not sure what language you're using, so I'll use JavaScript for the example:

function validate(str) {
    var required = /b(mobility|enterprise|products)b/i;
    var blocked = /b(store|foo|bar)b/i;

    return required.test(str) && !blocked.test(str);
}

If you really want to do it in one pattern, try something like this:

/(?=.*b(mobility|enterprise|products)b)(?!.*b(store|foo|bar)b)(.+)/i

The i at the end means case-insensitive, so use your language's equivalent if you're not using JavaScript.

All that being said, based on your description of the problem, I think what you REALLY want for this is string manipulation. Here's an example, again using JS:

function validate(str) {
    var required = ['mobility','enterprise','products'];
    var blocked = ['store','foo','bar'];
    var lowercaseStr = str.toLowerCase(); //or just use str if you want case sensitivity

    for (var i = 0; i < required.length; i++) {
        if (lowercaseStr.indexOf(required[i]) === -1) {
            return false;
        }
    }

    for (var j = 0; j < blocked.length; j++) {
        if (lowercaseStr.indexOf(blocked[j]) !== -1) {
            return false;
        }
    }
}

To match a string which must have word from a set of words you can use positive lookahead as:

^(?=.*(?:inc1|inc2|...))

To not match a string which has a word from a list of stop words you can use negative lookahead as:

^(?!.*(?:ex1|ex2|...))

You can combine the above two requirements in single regex as:

^(?=.*(?:inc1|inc2|...))(?!.*(?:ex1|ex2|...))REGEX_TO_MATCH_URL$

Rubular link

Make two regexes one for good and one for bad, and check both? (first the bad, then the good). You can do it with a single regex, but KISS is always a good rule ( http://en.wikipedia.org/wiki/KISS_principle )

I'll add that you need to consider the "ass" principle... .*ass matches ambassador and cassette , so you'll probably want to have a separator ( [./] ) before and after each word. Obscenity Filters: Bad Idea, or Incredibly Intercoursing Bad Idea?

链接地址: http://www.djcxy.com/p/76698.html

上一篇: RegEx判断一个字符串是否不是特定的字符串

下一篇: 正则表达式的帮助包括和排除