is char signed or unsigned by default on iOS?
In most cases, char
is unsigned on ARM (for performance reasons) and signed on other platforms.
iOS differs from the normal convention for ARM, and char is signed by default. (In particular this differs from Android, where code compiled in the ndk defaults to unsigned char.)
This can be changed in xcode, there is a 'char' Type is unsigned
option (which defaults to off). If this is changed to "yes", xcode will pass -funsigned-char
to llvm. (Checked on xcode 5.0.2)
The reason why iOS differs is mentioned in iOS ABI Function Call Guide: ARM64 Function Calling Conventions, which says simply:
In iOS, as with other Darwin platforms, both char and wchar_t are
signed types.
Is char signed or unsigned by default?
The book is wrong. The standard does not specify if plain char
is signed or unsigned.
In fact, the standard defines three distinct types: char
, signed char
, and unsigned char
. If you #include <limits.h>
and then look at CHAR_MIN
, you can find out if plain char
is signed
or unsigned
(if CHAR_MIN
is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.
Do note that char
is special in this way. If you declare a variable as int
it is 100% equivalent to declaring it as signed int
. This is always true for all compilers and architectures.
What causes a char to be signed or unsigned when using gcc?
According to the C11 standard (read n1570), char
can be signed
or unsigned
(so you actually have two flavors of C). What exactly it is is implementation specific.
Some processors and instruction set architectures or application binary interfaces favor a signed
character (byte) type (e.g. because it maps nicely to some machine code instruction), other favor an unsigned
one.
gcc
has even some -fsigned-char
or -funsigned-char
option which you should almost never use (because changing it breaks some corner cases in calling conventions and ABIs) unless you recompile everything, including your C standard library.
You could use feature_test_macros(7) and <endian.h>
(see endian(3)) or autoconf on Linux to detect what your system has.
In most cases, you should write portable C code, which does not depend upon those things. And you can find cross-platform libraries (e.g. glib) to help you in that.
BTW gcc -dM -E -x c /dev/null
also gives __BYTE_ORDER__
etc, and if you want an unsigned 8 bit byte you should use <stdint.h>
and its uint8_t
(more portable and more readable). And standard limits.h defines CHAR_MIN
and SCHAR_MIN
and CHAR_MAX
and SCHAR_MAX
(you could compare them for equality to detect signed char
s implementations), etc...
BTW, you should care about character encoding, but most systems today use UTF-8 everywhere. Libraries like libunistring are helpful. See also this and remember that practically speaking an Unicode character encoded in UTF-8 can span several bytes (i.e. char
-s).
Why don't the C or C++ standards explicitly define char as signed or unsigned?
Historical reasons, mostly.
Expressions of type char
are promoted to int
in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plain char
signed.
On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms, char
pretty much has to be unsigned.
The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:
Three types of char are specified:
signed
, plain, andunsigned
. A
plainchar
may be represented as either signed or unsigned, depending
upon the implementation, as in prior practice. The typesigned char
was introduced to make available a one-byte signed integer type on
those systems which implement plain char as unsigned. For reasons of
symmetry, the keywordsigned
is allowed as part of the type name of
other integral types.
Going back even further, an early version of the C Reference Manual from 1975 says:
A
char
object may be used anywhere anint
may be. In all cases the
char
is converted to anint
by propagating its sign through the upper
8 bits of the resultant integer. This is consistent with the two’s
complement representation used for both characters and integers.
(However, the sign-propagation feature disappears in other
implementations.)
This description is more implementation-specific than what we see in later documents, but it does acknowledge that char
may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of a char
object to int
would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have the signed
or unsigned
keyword.)
C's immediate predecessor was a language called B. B was a typeless language, so the question of char
being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie's home page, now moved here.
As for what's happening in your code (applying modern C rules):
char c = 0xff;
bool b = 0xff == c;
If plain char
is unsigned, then the initialization of c
sets it to (char)0xff
, which compares equal to 0xff
in the second line. But if plain char
is signed, then 0xff
(an expression of type int
) is converted to char
-- but since 0xff
exceeds CHAR_MAX (assuming CHAR_BIT==8
), the result is implementation-defined. In most implementations, the result is -1
. In the comparison 0xff == c
, both operands are converted to int
, making it equivalent to 0xff == -1
, or 255 == -1
, which is of course false.
Another important thing to note is that unsigned char
, signed char
, and (plain) char
are three distinct types. char
has the same representation as either unsigned char
or signed char
; it's implementation-defined which one it is. (On the other hand, signed int
and int
are two names for the same type; unsigned int
is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int
is signed or unsigned.))
Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.
Count characters in UTF8 when plain char is unsigned
It should.
You are only using binary operators and those function the same irrespective of whether the underlying data type is signed or unsigned. The only exception may be the !=
operator, but you could replace this with a &
and then embrace the whole thing with a !
, ala:
!((*s & 0xc0) & 0x80)
and then you have solely binary operators.
You can verify that the characters are promoted to integers by checking section 3.3.10 of the ANSI C Standard which states that "Each of the operands [of the bitwise AND] shall have integral type."
EDIT
I amend my answer. Bitwise operations are not the same on signed as on unsigned, as per 3.3 of the ANSI C Standard:
Some operators (the unary operator ~ , and the binary operators << , >> , & , ^ , and | ,
collectively described as bitwise operators )shall have operands that have integral type.
These operators return
values that depend on the internal representations of integers, and
thus have implementation-defined aspects for signed types.
In fact, performing bitwise operations on signed integers is listed as a possible security hole here.
In the Visual Studio compiler signed and unsigned are treated the same (see here).
As this SO question discusses, it is better to use unsigned char
to do byte-wise reads of memory and manipulations of memory.
Why unsigned types are more efficient in arm cpu?
Prior to ARMv4, ARM had no native support for loading halfwords and signed bytes. To load a signed byte you had to LDRB
then sign extend the value (LSL
it up then ASR
it back down). This is painful so char
is unsigned
by default.
In ARMv4 instructions were added to handle halfwords and signed values. These new instructions had to be squeezed into the available instruction space. Limits on the space available meant that they could not be made as flexible as the original instructions, which are able to do various address computations when loading the value.
So you may find that LDRSB
, for example, is unable to combine a fetch from memory with an address computation whereas LDRB
could. This can cost cycles. Sometimes we can rework short
-heavy code to operate on pairs of ints
to avoid this.
There's more info on my site here: http://www.davespace.co.uk/arm/efficient-c-for-arm/memaccess.html
How can I make to signed char in C function like unsigned
Change all the char
in codeBinPair
to unsigned char
.
unsigned char codeBinPair(unsigned char str)
{
int idx = 0;
unsigned char ch1 = str, ch2 = 0, mask = ONE;
while (TRUE)
{
mask <<= ONE;
mask &= ch1;
mask >>= ONE;
ch2 |= mask;
// Initialize the mask
mask = ONE;
mask <<= idx;
mask &= ch1;
mask <<= ONE;
ch2 |= mask;
// Index next loop we want to replace
idx += 2;
// If we finish whole byte
if (idx == CHAR_BIT)
return ch2;
// Initialize the mask
mask = ONE;
mask <<= idx;
}
}
Related Topics
Load a .Tmx (Tiled Map) in Sprite Kit
Submitting iOS App to App Store Application Identifier Invalid
Optional Binding Succeeds If It Shouldn'T
How to Ensure to Run Some Code on Same Background Thread
Open App from Sms with My Url Scheme as a Link
Why Is Uiwebview Cangoback=No in iOS7
iOS Permission Alerts - Removing or Suppressing
How to Group Array of Objects by Date in Swift
Uiscrollview with iOS Auto Layout Constraints: Wrong Size for Subviews
Best Analytics Offering for Iphone
Add a Text Overlay with Avmutablevideocomposition to a Specific Timerange
How to Control Avassetwriter to Write at the Correct Fps
Date Formats from Device Based on Locale
Multiple Unusernotifications Not Firing
iOS 10: How to Show Incoming Voip Call Notification When App Is in Background