character type int
A character constant has type int
in C.
Now suppose my machine's local character set is Windows Latin-1 ( http://www.ascii-code.com/) which is a 256 character set so every char
between single quotes, like 'x'
, is mapped to an int
value between 0 and 255 right ?
Suppose plain char
is signed
on my machine and consider the following code:
char ch = 'â'
if(ch == 'â')
{
printf("ok");
}
Because of the integer promotion ch
will be promoted into a negative quantity of type int
(cause it has a leading zero) and being â
mapped to a positive quantity ok
will not be printed.
But I'm sure i'm missing something , can you help ?
Your C implementation has a notion of an execution character set. Moreover, if your program source code is read from a file (as it always is), the compiler has (or should have) a notion of a source character set. For example, in GCC you can tune those parameters on the command line. The combination of those two settings determines the integral value that is assigned to your literal â
.
Actually, the initial assignment will not work as expected:
char ch = 'â';
There's an overflow here, and gcc will warn about it. Technically, this is undefined behavior, although for the very common single-byte char
type, the behavior is predictable enough -- it's a simple integer overflow. Depending on your default character set, that's a multibyte character; I get decimal 50082 if I print it as an integer on my machine.
Furthermore, the comparison is invalid, again because char
is too small to hold the value being compared, and again, a good compiler will warn about it.
ISO C defines wchar_t
, a type wide enough to hold extended (ie, non-ASCII) characters, along with wide character versions of many library functions. Code that must deal with non-ASCII text should use this wide character type as a matter of course.
In a case where char
is signed:
When processing char ch = 'â'
, the compiler will convert â to 0xFFFFFFE2, and store 0xE2 in ch. There is no overflow, as the value is signed.
When processing if(ch == 'â')
, the compiler will extend ch (0xE2) to integer (0xFFFFFFE2) and compare it to 'â' (0xFFFFFFE2 also), so the condition will be true.
上一篇: C标准:字符集和字符串编码规范
下一篇: 字符类型int