Why Are C Character Literals Ints Instead of Chars

Why are C character literals ints instead of chars?

discussion on same subject

"More specifically the integral promotions. In K&R C it was virtually (?)
impossible to use a character value without it being promoted to int first,
so making character constant int in the first place eliminated that step.
There were and still are multi character constants such as 'abcd' or however
many will fit in an int."

Why is sizeof a char literal not the same as sizeof(char)? [duplicate]

In C opposite to C++ integer character constants (literals) have the type int.

So the value of the expression sizeof( 'a' ) is equal to the value of the expression sizeof( int ). While sizeof( char ) is always equal to 1.

From the C Standard (6.5.3.4 The sizeof and alignof operators)

4 When sizeof is applied to an operand that has type char, unsigned
char, or signed char, (or a qualified version thereof) the result is
1...

and (6.4.4.4 Character constants)

10 An integer character constant has type int. The value of an
integer character constant containing a single character that maps to
a single-byte execution character is the numerical value of the
representation of the mapped character interpreted as an integer. The
value of an integer character constant containing more than one
character (e.g., 'ab'), or containing a character or escape sequence
that does not map to a single-byte execution character, is
implementation-defined. If an integer character constant contains a
single character or escape sequence, its value is the one that results
when an object with type char whose value is that of the single
character or escape sequence is converted to type int.

Pay attention to that usually objects of the type char used in operations as operands or in expressions are converted to the type int due to the integer promotions.

Why do we have the type of char in C, if a character literal is always of type int? Isn´t the whole type of char in C redundant?

Just be cause a character constant in C source code has type int doesn't mean that the type char has no use.

The type char occupies 1 byte of space. So you can use it anyplace where the values are in the range of a char, which includes ASCII characters. You can read and write those characters from either the console or a file as single byte entities. The fact that a character constant in source code has a different type doesn't change that.

Using char in an array also means you're using less memory than if you had an array of int, which can be useful in situations where space is at a premium. This is especially true if you're using it as a binary format to store data on disk or send it over a network.

A char * can also be used to access the individual bytes of any object if you need to see how that object is represented.

Difference between char in C and C++? [duplicate]

is char in C++ is integral type or strict char type ?

Character types, such as char, are integral types in C++.

The type of narrow character constant in C is int, while the type of narrow character literal in C++ is char.

literal character in C: is it an int or a char?

In C, character constants such as 'A' are of type int. In C++, they're of type char.

In C, the type of a character constant rarely matters. It's guaranteed to be int, but if the language were changed to make it char, most existing code would continue to work properly. (Code that explicitly refers to sizeof 'A' would change behavior, but there's not much point in writing that unless you're trying to distinguish between C and C++, and there are better and more reliable ways to do that. There are cases involving macros where sizeof 'A' might be sensible; I won't get into details here.)

In your code sample:

int i = 0 + 'A';

0 is of type int, and the two operands of + are promoted, if necessary, to a common type, so the behavior is exactly the same either way. Even this:

char A = 'A';
int i = 0 + A;

does the same thing, with A (which is of type char) being promoted to int. Expressions of type char are usually, but not always, implicitly promoted to int.

In C++, character constants are of type char -- but the same promotion rules apply. When Stroustrup designed C++, he changed the type of character constants for consistency (it's admittedly a bit surprising that A is of type int), and to enable more consistent overloading (which C doesn't support). For example, if C++ character constants were of type int, then this:

std::cout << 'A';

would print 65, the ASCII value of 'A' (unless the system uses EBCDIC); it makes more sense for it to print A.

int i = 0 + (int)'A';

The cast is unnecessary in both C and C++. In C, 'A' is already of type int, so the conversion has no effect. In C++, it's of type char, but without the cast it would be implicitly converted to int anyway.

In both C and C++, casts should be viewed with suspicion. Both languages provide implicit conversions in many contexts, and those conversions usually do the right thing. An explicit cast either overrides the implicit conversion or creates a conversion that would not otherwise take place. In many (but by no means all) cases, a cast indicates a problem that's better solved either by using a language-provided implicit conversion, or by changing a declaration so the thing being converted is of the right type in the first place.

(As Pascal Cuoq reminds me in comments, if plain char is unsigned and as wide as int, then an expression of type char will be promoted to unsigned int, not to int. This can happen only if CHAR_BIT >= 16, i.e., if the implementation has 16-bit or bigger bytes, and if sizeof (int) == 1, and if plain char is unsigned. I'm not sure that any such implementations actually exist, though I understand that C compilers for some DSPs do have CHAR_BIT > 8.)

Why is the size of a character literal in C different than in C++ [duplicate]

In C, 'i' has type int for backwards-compatibility reasons. Thus sizeof('i') shows the size of an int on the chosen compilation platform.

In C++, because overloading made it more urgent to avoid giving surprising types to expression, it was decided to break backwards compatibility and to give 'i' the type char.

ANSI C: Why character functions accept int argument instead of char argument?

Characters and integers are rather tightly knit in C.

When you receive a character from an input stream, it must be able to represent every single character plus the end-of-file symbol.

That means a char type won't be big enough so they use a wider type.

The C99 rationale document states:

Since these functions are often used primarily as macros, their domain is restricted to the small positive integers representable in an unsigned char, plus the value of EOF. EOF is traditionally -1, but may be any negative integer, and hence distinguishable from any valid character code. These macros may thus be efficiently implemented by using the argument as an index into a small array of attributes.

The standard itself has this to say:

The header <ctype.h> declares several functions useful for classifying and mapping
characters. In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the macro EOF. If the
argument has any other value, the behavior is undefined.

Why does C use two single quotes to delimit char literals instead of just one?

cppreference.com says that multicharacter constants were inherited to C already from the B programming language, so probably have existed from the start. Since they can be of various widths, the ending quote is pretty much a requirement.

Apart from that and aesthetics in general, a character constant representing the space character in particular would look somewhat awkward and be a likely magnet for mistakes if it was just ' instead of ' '.

Why does memset take an int instead of a char?

memset predates (by quite a bit) the addition of function prototypes to C. Without a prototype, you can't pass a char to a function -- when/if you try, it'll be promoted to int when you pass it, and what the function receives is an int.

It's also worth noting that in C, (but not in C++) a character literal like 'a' does not have type char -- it has type int, so what you pass will usually start out as an int anyway. Essentially the only way for it to start as a char and get promoted is if you pass a char variable.

In theory, memset could probably be modified so it receives a char instead of an int, but there's unlikely to be any benefit, and a pretty decent possibility of breaking some old code or other. With an unknown but potentially fairly high cost, and almost no chance of any real benefit, I'd say the chances of it being changed to receive a char fall right on the line between "slim" and "none".

Edit (responding to the comments): The CHAR_BIT least significant bits of the int are used as the value to write to the target.



Related Topics



Leave a reply



Submit