Why do some character literals cause Syntax Errors in Java?
In the latest edition of JavaSpecialists newsletter, the author mentions a piece of code that is un-compilable in Java
public class A1 {
Character aChar = 'u000d';
}
Try compile it, and you will get an error, such as:
A1.java:2: illegal line end in character literal Character aChar = 'u000d'; ^
Why an equivalent piece of c# code does not show such a problem?
public class CharacterFixture
{
char aChar = 'u000d';
}
Am I missing anything?
EDIT: My original intention of question was how c# compiler got unicode file parsing correct (if so) and why java should still stick with the incorrect(if so) parsing? EDIT: Also i want myoriginal question title to be restored? Why such a heavy editing and i strongly suspect that it heavily modified my intentions.
Java's compiler translates uxxxx
escape sequences as one of the very first steps, even before the tokenizer gets a crack at the code. By the time it actually starts tokenizing, there are no uxxxx
sequences anymore; they're already turned into the chars they represent, so to the compiler your Java example looks the same as if you'd actually typed a carriage return in there somehow. It does this in order to provide a way to use Unicode within the source, regardless of the source file's encoding. Even ASCII text can still fully represent Unicode chars if necessary (at the cost of readability), and since it's done so early, you can have them almost anywhere in the code. (You could say u0063u006cu0061u0073u0073u0020u0053u0074u0075u0066u0066u0020u007bu007d
, and the compiler would read it as class Stuff {}
, if you wanted to be annoying or torture yourself.)
C# doesn't do that. uxxxx
is translated later, with the rest of the program, and is only valid in certain types of tokens (namely, identifiers and string/char literals). This means it can't be used in certain places where it can be used in Java. clu0061ss
is not a keyword, for example.
上一篇: Java Unicode转换