Meaning of U Suffix

Meaning of U suffix

It stands for unsigned.

When you declare a constant, you can also specify its type. Another common example is L, which stands for long. (and you have to put it twice to specify a 64-bit constant).

Example: 1ULL.

It helps in avoiding explicit casts.

What is the purpose of U's like here 0x01U

u is the "unsigned-suffix" and on its own makes an integer constant unsigned int, unsigned long int or unsigned long long int depending on the value.

In the C standard, see section 6.4.4.1 (Integer constants) paragraph 5.

Defending U suffix after Hex literals

Appending a U suffix to all hexadecimal constants makes them unsigned as you already mentioned. This may have undesirable side-effects when these constants are used in operations along with signed values, especially comparisons.

Here is a pathological example:

#define MY_INT_MAX  0x7FFFFFFFU   // blindly applying the rule

if (-1 < MY_INT_MAX) {
printf("OK\n");
} else {
printf("OOPS!\n");
}

The C rules for signed/unsigned conversions are precisely specified, but somewhat counter-intuitive so the above code will indeed print OOPS.

The MISRA-C rule is precise as it states A “U” suffix shall be applied to all constants of unsigned type. The word unsigned has far reaching consequences and indeed most constants should not really be considered unsigned.

Furthermore, the C Standard makes a subtile difference between decimal and hexadecimal constants:

  • A hexadecimal constant is considered unsigned if its value can be represented by the unsigned integer type and not the signed integer type of the same size for types int and larger.

This means that on 32-bit 2's complement systems, 2147483648 is a long or a long long whereas 0x80000000 is an unsigned int. Appending a U suffix may make this more explicit in this case but the real precaution to avoid potential problems is to mandate the compiler to reject signed/unsigned comparisons altogether: gcc -Wall -Wextra -Werror or clang -Weverything -Werror are life savers.

Here is how bad it can get:

if (-1 < 0x8000) {
printf("OK\n");
} else {
printf("OOPS!\n");
}

The above code should print OK on 32-bit systems and OOPS on 16-bit systems. To make things even worse, it is still quite common to see embedded projects use obsolete compilers which do not even implement the Standard semantics for this issue.

For your specific question, the defined values for micro-processor registers used specifically to set them via assignment (assuming these registers are memory-mapped), need not have the U suffix at all. The register lvalue should have an unsigned type and the hex value will be signed or unsigned depending on its value, but the operation will proceed the same. The opcode for setting a signed number or an unsigned number is the same on your target architecture and on any architectures I have ever seen.

ULL suffix on a numeric literal

From the gcc manual:

ISO C99 supports data types for integers that are at least 64 bits wide ( . . . ) . To make an integer constant of type long long int, add the suffix LL to the integer. To make an integer constant of type unsigned long long int, add the suffix ULL to the integer.

These suffixes have also been added to C++ in C++11, and were already supported long long (pun intended) before that as compiler extensions.

C literal suffix U, UL problems

A decimal number like -1,1,2,12345678, etc. without any suffix will get the smallest type it will fit, starting with int, long, long long.

An octal or hex number like 0, 0123, 0x123, 0X123 without any suffix will get the smallest type it will fit, starting with int, unsigned, long, unsigned long, long long, unsigned long long.


The following is a potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 31. Note: unsigned long must be at least 32 bits. It would result in 0 ** if unsigned long was 32 bits, but non-zero if longer.

(0x1UL << AAR_INTENSET_NOTRESOLVED_Pos)

The following is a similar potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 15. 0x1 is an unsigned, which must only be at least 16 bits. Also if unsigned/int is 16 bits, the minimum, 0x1 will be int. So without explicitly using U, 0x1 could be a problem if AAR_INTENSET_NOTRESOLVED_Pos == 15. [@Matt McNabb]

(0x1 << AAR_INTENSET_NOTRESOLVED_Pos)

Bitwise shift operators
"The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." C11dr §6.5.7 3


Machine width is not the key issue. An 8bit or 16bit machine could use 16, 32, etc. bit size int. Again, 16 bit is the minimum size for a compliant C compiler.


[Edit] **
I should have said " It (shifting more than 31 bits) would result in Undefined behavior, UB, if unsigned long was 32 bits."

what is the reason for explicitly declaring L or UL for long values

When a suffix L or UL is not used, the compiler uses the first type that can contain the constant from a list (see details in C99 standard, clause 6.4.4:5. For a decimal constant, the list is int, long int, long long int).

As a consequence, most of the times, it is not necessary to use the suffix. It does not change the meaning of the program. It does not change the meaning of your example initialization of x for most architectures, although it would if you had chosen a number that could not be represented as a long long. See also codebauer's answer for an example where the U part of the suffix is necessary.


There are a couple of circumstances when the programmer may want to set the type of the constant explicitly. One example is when using a variadic function:

printf("%lld", 1LL); // correct, because 1LL has type long long
printf("%lld", 1); // undefined behavior, because 1 has type int

A common reason to use a suffix is ensuring that the result of a computation doesn't overflow. Two examples are:

long x = 10000L * 4096L;
unsigned long long y = 1ULL << 36;

In both examples, without suffixes, the constants would have type int and the computation would be made as int. In each example this incurs a risk of overflow. Using the suffixes means that the computation will be done in a larger type instead, which has sufficient range for the result.

As Lightness Races in Orbit puts it, the litteral's suffix comes before the assignment. In the two examples above, simply declaring x as long and y as unsigned long long is not enough to prevent the overflow in the computation of the expressions assigned to them.


Another example is the comparison x < 12U where variable x has type int. Without the U suffix, the compiler types the constant 12 as an int, and the comparison is therefore a comparison of signed ints.

int x = -3;
printf("%d\n", x < 12); // prints 1 because it's true that -3 < 12

With the U suffix, the comparison becomes a comparison of unsigned ints. “Usual arithmetic conversions” mean that -3 is converted to a large unsigned int:

printf("%d\n", x < 12U); // prints 0 because (unsigned int)-3 is large

In fact, the type of a constant may even change the result of an arithmetic computation, again because of the way “usual arithmetic conversions” work.


Note that, for decimal constants, the list of types suggested by C99 does not contain unsigned long long. In C90, the list ended with the largest standardized unsigned integer type at the time (which was unsigned long). A consequence was that the meaning of some programs was changed by adding the standard type long long to C99: the same constant that was typed as unsigned long in C90 could now be typed as a signed long long instead. I believe this is the reason why in C99, it was decided not to have unsigned long long in the list of types for decimal constants.
See this and this blog posts for an example.

#define FOO 1u 2u 4u ... What does 1u and 2u mean?

It defines an unsigned integer literal. You can also see where they defined a hex literal to be an unsigned long integer by using 0x...UL.

If you would like to know the bit pattern they produce, simply translate the decimal literals to their equivalent hex or binary literals. 1U becomes 0x01U and 01b1 in hex and binary respectively.

Another more commonly seen literal uses the f-suffix, that is a single precision floating point literal like 1.0f.

1. for illustration only, not an actual literal per the standard



Related Topics



Leave a reply



Submit