Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?
Yes, the argument to toupper
needs to be converted to unsigned char
to avoid the risk of undefined behavior.
The types char
, signed char
, and unsigned char
are three distinct types. char
has the same range and representation as either signed char
or unsigned char
. (Plain char
is very commonly signed and able to represent values in the range -128..+127.)
The toupper
function takes an int
argument and returns an int
result. Quoting the C standard, section 7.4 paragraph 1:
In all cases the argument is an
int
, the value of which shall be
representable as anunsigned char
or shall equal the value of
the macroEOF
. If the argument has any other value, the
behavior is undefined.
(C++ incorporates most of the C standard library, and defers its definition to the C standard.)
The []
indexing operator on std::string
returns a reference to char
. If plain char
is a signed type, and if the value of name[0]
happens to be negative, then the expression
toupper(name[0])
has undefined behavior.
The language guarantees that, even if plain char
is signed, all members of the basic character set have non-negative values, so given the initialization
string name = "Niels Stroustrup";
the program doesn't risk undefined behavior. But yes, in general a char
value passed to toupper
(or to any of the functions declared in <cctype>
/ <ctype.h>
) needs to be converted to unsigned char
, so that the implicit conversion to int
won't yield a negative value and cause undefined behavior.
The <ctype.h>
functions are commonly implemented using a lookup table. Something like:
// assume plain char is signed
char c = -2;
c = toupper(c); // undefined behavior
may index outside the bounds of that table.
Note that converting to unsigned
:
char c = -2;
c = toupper((unsigned)c); // undefined behavior
doesn't avoid the problem. If int
is 32 bits, converting the char
value -2
to unsigned
yields 4294967294
. This is then implicitly converted to int
(the parameter type), which probably yields -2
.
toupper
can be implemented so it behaves sensibly for negative values (accepting all values from CHAR_MIN
to UCHAR_MAX
), but it's not required to do so. Furthermore, the functions in <ctype.h>
are required to accept an argument with the value EOF
, which is typically -1
.
The C++ standard makes adjustments to some C standard library functions. For example, strchr
and several other functions are replaced by overloaded versions that enforce const
correctness. There are no such adjustments for the functions declared in <cctype>
.
How do I use tolower() with a char array?
As suggested in an answer to this related question, these two warnings are caused by a "bug" in the MSVC Code Analyser.
I even tried the 'fix' I suggested in my answer to that question (that is, using char* unit = malloc(max(strlen(userInput), 0) + 1);
) – but it didn't work in your code (not sure why).
However, what did work (and I have no idea why) is to use the strdup
function in place of your calls to malloc
and strcpy
– it does the same thing but in one fell swoop.
Adding the casts (correctly)1 suggested in the comments, here's a version of your code that doesn't generate the spurious C6385 and C6386 warnings:
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
size_t unit_match_index(char* userInput)
{
char* unit = strdup(userInput);
//convert to lowercase
for (size_t i = 0; i < strlen(unit); ++i) {
unit[i] = (char)tolower((unsigned char)unit[i]);
}
//...
return 0;
}
However, MSVC will now generate a different (but equally spurious) warning:
warning C4996: 'strdup': The POSIX name for this item is deprecated.
Instead, use the ISO C and C++ conformant name: _strdup. See online
help for details.
As it happens, the strdup
function (without the leading underscore) is adopted as part of the ISO Standard since C23 (6/2019).
1 On the reasons for the casts when using the tolower
function, see: Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?. However, simply adding those casts does not silence the two code analysis warnings.
How to use toUpper and toLower in Haskell without importing module Data.Char?
toLower''
is matching the lower-case characters instead of the upper case. (ToUpper''
works). Fixed:
toLower'' :: [Char]-> [Char]
toLower'' [] = []
toLower'' (x : xs)
| x `elem` ['A' .. 'Z'] = toEnum (fromEnum x + 32) : toLower'' xs
| otherwise = x : toLower'' xs
Which tolower in C++?
It should be noted that the language designers were aware of cctype
's tolower
when locale
's tolower
was created. It improved in 2 primary ways:
- As is mentioned in progressive_overload's answer the
locale
version allowed the use of thefacet ctype
, even a user modified one, without requiring the shuffling in of a newLC_CTYPE
in viasetlocale
and the restoration of the previousLC_CTYPE
- From section 7.1.6.2[dcl.type.simple]3:
It is implementation-defined whether objects of
char
type are represented as signed or unsigned quantities. Thesigned
specifier forceschar
objects to be signed
Which creates an the potential for undefined behavior with the cctype
version of tolower
's if it's argument:
Is not representable as
unsigned char
and does not equalEOF
So there is an additional input and output static_cast
required by the cctype
version of tolower
yielding:
transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });
Since the locale
version operates directly on char
s there is no need for a type conversion.
So if you don't need to perform the conversion in a different facet ctype
it simply becomes a style question of whether you prefer the transform
with a lambda required by the cctype
version, or whether you prefer the locale
version's:
use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));
toupper tolower not working , help what's wrong with my code
The main problem is that you are trying to modify string literals such as "This is a test of UCase"
. This is undefined behaviour. You need to copy the literals into a char
array that you can modify.
Also note that binding char*
to a string literal is deprecated and forbidden, for good reason. This should have emitted a warning:
UCase("This is a test of UCase") // not good: binding char* to literal
There are other problems with your code: undefined behaviour (UB) in loops with uninitialized variables,
for ( int i ; i < len ; i++) // using uninitialized i: UB
You should also have a look at toupper
and tolower
documentation. They both accept int
with some restrictions on their values. You have to ensure you don't pass a value that causes undefined behaviour, bearing in mind that char
can be signed. See for example Do I need to cast to unsigned char before calling toupper
?
What's an alternate way of getting the char's int value to increment?
Though in C string literals have types of non-constant character arrays nevertheless you may not change string literals.
char *hello = "Hello, world!";
From the C Standard (6.4.5 String literals)
7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
So you should declare the identifier hello
like a character array
char hello[] = "Hello, world!";
Within the function you should not use magic numbers like 32
. For example if the compiler uses the EBCDIC coding your function will produce a wrong result.
And in the loop instead of the type int
you have to use the type size_t
because an object of the type int can be unable to store all values of the type size_t
that is the return type of the sizeof
operator or of the function strlen
.
This statement
(int) text[i] += 32;
does not make a sense because in the left side of the expression there is an rvalue due to the casting.
The function can be implemented the following way
char * lowercase( char *text )
{
for ( char *p = text; *p; ++p )
{
if ( isalpha( ( unsigned char )*p ) )
{
*p = tolower( ( unsigned char )*p );
}
}
return text;
}
Related Topics
Difference Between Static and Dynamic Arrays in C++
How to Iterate Over Cin Line by Line in C++
Difference Between _Tmain() and Main() in C++
C++ Lambda With Captures as a Function Pointer
Catching Exception: Divide by Zero
What Happens If You Static_Cast Invalid Value to Enum Class
Why Should One Not Derive from C++ Std String Class
Why Are #Ifndef and #Define Used in C++ Header Files
How to Call a Parent Class Function from Derived Class Function
Why Do C and C++ Compilers Allow Array Lengths in Function Signatures When They'Re Never Enforced
Why Can't the Template Argument Be Deduced When It Is Used as Template Parameter to Another Template
Make_Unique and Perfect Forwarding
What Is a Converting Constructor in C++ ? What Is It For
How to Create an Std::Function from a Move-Capturing Lambda Expression