Should hyphens be escaped?
Possible Duplicate:
How to match hyphens with Regular Expression?
Hyphen is a special character in regex, for instance, to select a range, I could do something like:
[0-9A-F]
But outside of square brackets it's just a regular character right? I've tested this on a couple of online regex testers, and hyphens seem to function as a normal character outside of square brackets (or even inside of square brackets if it's not in-between two characters - eg [-g] seems to match - or g) whether it's escaped or not. I couldn't find the answer to this, but I'm wondering whether or not it is conventional to escape hyphens.
Thanks!
Correct on all fronts. Outside of a character class (that's what the "square brackets" are called) the hyphen has no special meaning, and within a character class, you can place a hyphen as the first or last character in the range (eg [-az]
or [0-9-]
), OR escape it (eg [az-0-9]
) in order to add "hyphen" to your class.
It's more common to find a hyphen placed first or last within a character class, but by no means will you be lynched by hordes of furious neckbeards for choosing to escape it instead.
(Actually... my experience has been that a lot of regex is employed by folks who don't fully grok the syntax. In these cases, you'll typically see everything escaped (eg [az%$#@!-_]
) simply because the engineer doesn't know what's "special" and what's not... so they "play it safe" and obfuscate the expression with loads of excessive backslashes. You'll be doing yourself, your contemporaries, and your posterity a huge favor by taking the time to really understand regex syntax before using it.)
Great question!
Outside of character classes, it is conventional not to escape hyphens. If I saw an escaped hyphen outside of a character class, that would suggest to me that it was written by someone who was not very comfortable with regexes.
Inside character classes, I don't think one way is conventional over the other; in my experience, it usually seems to be to put either first or last, as in [-._:]
or [._:-]
, to avoid the backslash; but I've also often seen it escaped instead, as in [._-:]
, and I wouldn't call that unconventional.
Typically you would always put the hyphen first in the []
match section. EG, to match any alphanumeric including hyphens (written the long way), you would use [-a-zA-Z0-9]
上一篇: 为什么这个正则表达式不能正常工作?
下一篇: 连字符应该逃脱吗?