C encoding of character constants

My programmer's instinct would say that a character constant in c (eg: 'x') is encoded using the machine character set from the machine on which it is compiled. However, the following exerpt is from "The C Programming Language: ANSI C Edition"

"A character constant is a sequence of one or more characters enclosed in single quotes, as in 'x'. The value of a character constant with only one character is the numeric value of the charachter in the machine's character set at execution time."

Emphasis on the last 3 words.

Can anyone explain why they would say "at execution time". Surely the character value is encoded in the compiled binary (or ELF, A.OUT...) ?

I was wondering, but couldn't come up with any logical explanaition for this, surely K & R knew what they were doing!


You will have to tell the compiler what system you are going to run the program on. It will then choose the proper encoding for the characters.

Of course, default is to run on a system similar to the one running the compiler. In that case the compile time and runtime character sets will be identical.


C distinguishes source character set and execution character set, because your compiler could be a cross compiler, eg on a PC for a mobile platform. Then the character set on the computer and the one on the target machine must not agree. Simplest example is the EOL encoding, that is different between the different common platforms on the market nowadays. The execution character set may also depend on "locales" and other knobs that are dynamically set by the user running the program.


Your problem seems to lie in the fact that you're confusing Character Set of the machine with Character Encoding used.

Read this http://www.microsoft.com/typography/unicode/cs.htm to understand what character set actually means. The problem at the time of KnR (2nd Edition) was that there were just too many computers, some manufactured for the local government and public. This caused different character sets popping up between two computers, so, 'A' on a US machine was a Cyrillic character(say Foo) on a Russian machine.

Hence character constants couldn't be TRUSTED. Thanks to the modern computer manufacturers now, most character sets in the machine are the same, and information exchange is simpler.

链接地址: http://www.djcxy.com/p/11730.html

上一篇: 返回值和局部变量之间的区别

下一篇: 字符常量的C编码