Allowed characters for CSS identifiers
What are the (full) valid / allowed charset characters for CSS identifiers id
and class
?
Is there a regular expression that I can use to validate against? Is it browser agnostic?
The charset doesn't matter. The allowed characters matters more. Check the CSS specification. Here's a cite of relevance:
In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9]
and ISO 10646 characters U+00A1
and higher, plus the hyphen ( -
) and the underscore ( _
); they cannot start with a digit, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?"
may be written as "B&W?"
or "B26 W3F"
.
Update : As to the regex question, you can find the grammar here:
ident -?{nmstart}{nmchar}*
Which contains of the parts:
nmstart [_a-z]|{nonascii}|{escape}
nmchar [_a-z0-9-]|{nonascii}|{escape}
nonascii [240-377]
escape {unicode}|[^rnf0-9a-f]
unicode {h}{1,6}(rn|[ trnf])?
h [0-9a-f]
This can be translated to a Java regex as follows (I only added parentheses to parts containing the OR and escaped the backslashes):
String h = "[0-9a-f]";
String unicode = "\{h}{1,6}(rn|[ trnf])?".replace("{h}", h);
String escape = "({unicode}|\[^rnf0-9a-f])".replace("{unicode}", unicode);
String nonascii = "[240-377]";
String nmchar = "([_a-z0-9-]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String nmstart = "([_a-z]|{nonascii}|{escape})".replace("{nonascii}", nonascii).replace("{escape}", escape);
String ident = "-?{nmstart}{nmchar}*".replace("{nmstart}", nmstart).replace("{nmchar}", nmchar);
System.out.println(ident); // The full regex.
Update 2 : oh, you're more a PHP'er, well I think you can figure how/where to do str_replace
?
For anyone looking for something a little more turn-key. The full expression, replaced and all, from @BalusC's answer is:
/-?([_a-z]|[240-377]|([0-9a-f]{1,6}(rn|[ trnf])?|[^rnf0-9a-f]))([_a-z0-9-]|[240-377]|([0-9a-f]{1,6}(rn|[ trnf])?|[^rnf0-9a-f]))*/
And using DEFINE
, which I find a little more readable:
/(?(DEFINE)
(?P<h> [0-9a-f] )
(?P<unicode> (?&h){1,6}(rn|[ trnf])? )
(?P<escape> ((?&unicode)|[^rnf0-9a-f])* )
(?P<nonascii> [240-377] )
(?P<nmchar> ([_a-z0-9-]|(?&nonascii)|(?&escape)) )
(?P<nmstart> ([_a-z]|(?&nonascii)|(?&escape)) )
(?P<ident> -?(?&nmstart)(?&nmchar)* )
) (?:
(?&ident)
)/x
Incidentally, the original regular expression (and @human's contribution) had a few rogue escape characters that allow [
in the name.
Also, it should be noted that the raw regex without, DEFINE
, runs about 2x as fast as the DEFINE
expression, taking only ~23 steps to identify a single unicode character, while the later takes ~40.
This is merely a contribution to @BalusC answer. It is the PHP version of the Java code he provided, I converted it and I thought someone else could find it helpful.
$h = "[0-9a-f]";
$unicode = str_replace( "{h}", $h, "{h}{1,6}(rn|[ trnf])?" );
$escape = str_replace( "{unicode}", $unicode, "({unicode}|[^rnf0-9a-f])");
$nonascii = "[240-377]";
$nmchar = str_replace( array( "{nonascii}", "{escape}" ), array( $nonascii, $escape ), "([_a-z0-9-]|{nonascii}|{escape})");
$nmstart = str_replace( array( "{nonascii}", "{escape}" ), array( $nonascii, $escape ), "([_a-z]|{nonascii}|{escape})" );
$ident = str_replace( array( "{nmstart}", "{nmchar}" ), array( $nmstart, $nmchar ), "-?{nmstart}{nmchar}*");
echo $ident; // The full regex.
链接地址: http://www.djcxy.com/p/22840.html
上一篇: CSS类可以继承一个或多个其他类吗?
下一篇: 允许用于CSS标识符的字符