What special characters must be escaped in regular expressions?

I am tired of always trying to guess, if I should escape special characters like ' ()[]{}| ' etc. when using many implementations of regexps.

It is different with, for example, Python, sed, grep, awk, Perl, rename, Apache, find and so on. Is there any rule set which tells when I should, and when I should not, escape special characters? Does it depend on the regexp type, like PCRE, POSIX or extended regexps?


Which characters you must and which you mustn't escape indeed depends on the regex flavor you're working with.

For PCRE, and most other so-called Perl-compatible flavors, escape these outside character classes:

.^$*+?()[{|

and these inside character classes:

^-]

For POSIX extended regexes (ERE), escape these outside character classes (same as PCRE):

.^$*+?()[{|

Escaping any other characters is an error with POSIX ERE.

Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally, eg:

[]^-]

In POSIX basic regular expressions (BRE), these are metacharacters that you need to escape to suppress their meaning:

.^$*

Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs. Some implementations (eg GNU) also give special meaning to other characters when escaped, such as ? and +. Escaping a character other than .^$*(){} is normally an error with BREs.

Inside character classes, BREs follow the same rule as EREs.

If all this makes your head spin, grab a copy of RegexBuddy. On the Create tab, click Insert Token, and then Literal. RegexBuddy will add escapes as needed.


Modern RegEx Flavors (PCRE)

Includes C, C++, Delphi, EditPad, Java, JavaScript, Perl, PHP (preg), PostgreSQL, PowerGREP, PowerShell, Python, REALbasic, Real Studio, Ruby, TCL, VB.Net, VBScript, wxWidgets, XML Schema, Xojo, XRegExp.
PCRE compatibility may vary

Anywhere: . ^ $ * + - ? ( ) [ ] { } | . ^ $ * + - ? ( ) [ ] { } |


Legacy RegEx Flavors (BRE/ERE)

Includes awk, ed, egrep, emacs, GNUlib, grep, PHP (ereg), MySQL, Oracle, R, sed.
PCRE support may be enabled in later versions or by using extensions

ERE/awk/egrep/emacs

Outside a character class: . ^ $ * + ? ( ) [ { } | . ^ $ * + ? ( ) [ { } |
Inside a character class: ^ - [ ]

BRE/ed/grep/sed

Outside a character class: . ^ $ * [ . ^ $ * [
Inside a character class: ^ - [ ]
For literals, don't escape: + ? ( ) { } | + ? ( ) { } |
For standard regex behavior, escape: + ? ( ) { } | + ? ( ) { } |


Notes

  • If unsure about a specific character, it can be escaped like xFF
  • Alphanumeric characters cannot be escaped with a backslash
  • Arbitrary symbols can be escaped with a backslash in PCRE, but not BRE/ERE (they must only be escaped when required). For PCRE ] - only need escaping within a character class, but I kept them in a single list for simplicity
  • Quoted expression strings must also have the surrounding quote characters escaped, and often with backslashes doubled-up (like "(")(/)(.)" versus /(")(/)(.)/ in JavaScript)
  • Aside from escapes, different regex implementations may support different modifiers, character classes, anchors, quantifiers, and other features. For more details, check out regular-expressions.info, or use regex101.com to test your expressions live

  • Unfortunately there really isn't a set set of escape codes since it varies based on the language you are using.

    However, keeping a page like the Regular Expression Tools Page or this Regular Expression Cheatsheet can go a long way to help you quickly filter things out.

    链接地址: http://www.djcxy.com/p/95198.html

    上一篇: 8 MySQL和字符集

    下一篇: 什么特殊字符必须在正则表达式中转义?