Do I Need to Cast to Unsigned Char Before Calling Toupper(), Tolower(), Et Al.

Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?

Yes, the argument to toupper needs to be converted to unsigned char to avoid the risk of undefined behavior.

The types char, signed char, and unsigned char are three distinct types. char has the same range and representation as either signed char or unsigned char. (Plain char is very commonly signed and able to represent values in the range -128..+127.)

The toupper function takes an int argument and returns an int result. Quoting the C standard, section 7.4 paragraph 1:

In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of
the macro EOF . If the argument has any other value, the
behavior is undefined.

(C++ incorporates most of the C standard library, and defers its definition to the C standard.)

The [] indexing operator on std::string returns a reference to char. If plain char is a signed type, and if the value of name[0] happens to be negative, then the expression

toupper(name[0])

has undefined behavior.

The language guarantees that, even if plain char is signed, all members of the basic character set have non-negative values, so given the initialization

string name = "Niels Stroustrup";

the program doesn't risk undefined behavior. But yes, in general a char value passed to toupper (or to any of the functions declared in <cctype> / <ctype.h>) needs to be converted to unsigned char, so that the implicit conversion to int won't yield a negative value and cause undefined behavior.

The <ctype.h> functions are commonly implemented using a lookup table. Something like:

// assume plain char is signed
char c = -2;
c = toupper(c); // undefined behavior

may index outside the bounds of that table.

Note that converting to unsigned:

char c = -2;
c = toupper((unsigned)c); // undefined behavior

doesn't avoid the problem. If int is 32 bits, converting the char value -2 to unsigned yields 4294967294. This is then implicitly converted to int (the parameter type), which probably yields -2.

toupper can be implemented so it behaves sensibly for negative values (accepting all values from CHAR_MIN to UCHAR_MAX), but it's not required to do so. Furthermore, the functions in <ctype.h> are required to accept an argument with the value EOF, which is typically -1.

The C++ standard makes adjustments to some C standard library functions. For example, strchr and several other functions are replaced by overloaded versions that enforce const correctness. There are no such adjustments for the functions declared in <cctype>.

How do I use tolower() with a char array?

As suggested in an answer to this related question, these two warnings are caused by a "bug" in the MSVC Code Analyser.

I even tried the 'fix' I suggested in my answer to that question (that is, using char* unit = malloc(max(strlen(userInput), 0) + 1);) – but it didn't work in your code (not sure why).

However, what did work (and I have no idea why) is to use the strdup function in place of your calls to malloc and strcpy – it does the same thing but in one fell swoop.

Adding the casts (correctly)1 suggested in the comments, here's a version of your code that doesn't generate the spurious C6385 and C6386 warnings:

#include <stdlib.h>
#include <string.h>
#include <ctype.h>

size_t unit_match_index(char* userInput)
{
char* unit = strdup(userInput);
//convert to lowercase
for (size_t i = 0; i < strlen(unit); ++i) {
unit[i] = (char)tolower((unsigned char)unit[i]);
}
//...
return 0;
}

However, MSVC will now generate a different (but equally spurious) warning:

warning C4996: 'strdup': The POSIX name for this item is deprecated.
Instead, use the ISO C and C++ conformant name: _strdup. See online
help for details.

As it happens, the strdup function (without the leading underscore) is adopted as part of the ISO Standard since C23 (6/2019).


1 On the reasons for the casts when using the tolower function, see: Do I need to cast to unsigned char before calling toupper(), tolower(), et al.?. However, simply adding those casts does not silence the two code analysis warnings.

How to use toUpper and toLower in Haskell without importing module Data.Char?

toLower'' is matching the lower-case characters instead of the upper case. (ToUpper'' works). Fixed:

toLower'' :: [Char]-> [Char]
toLower'' [] = []
toLower'' (x : xs)
| x `elem` ['A' .. 'Z'] = toEnum (fromEnum x + 32) : toLower'' xs
| otherwise = x : toLower'' xs

Which tolower in C++?

It should be noted that the language designers were aware of cctype's tolower when locale's tolower was created. It improved in 2 primary ways:

  1. As is mentioned in progressive_overload's answer the locale version allowed the use of the facet ctype, even a user modified one, without requiring the shuffling in of a new LC_CTYPE in via setlocale and the restoration of the previous LC_CTYPE
  2. From section 7.1.6.2[dcl.type.simple]3:

It is implementation-defined whether objects of char type are represented as signed or unsigned quantities. The signed specifier forces char objects to be signed

Which creates an the potential for undefined behavior with the cctype version of tolower's if it's argument:

Is not representable as unsigned char and does not equal EOF

So there is an additional input and output static_cast required by the cctype version of tolower yielding:

transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });

Since the locale version operates directly on chars there is no need for a type conversion.

So if you don't need to perform the conversion in a different facet ctype it simply becomes a style question of whether you prefer the transform with a lambda required by the cctype version, or whether you prefer the locale version's:

use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));

toupper tolower not working , help what's wrong with my code

The main problem is that you are trying to modify string literals such as "This is a test of UCase". This is undefined behaviour. You need to copy the literals into a char array that you can modify.

Also note that binding char* to a string literal is deprecated and forbidden, for good reason. This should have emitted a warning:

UCase("This is a test of UCase") // not good: binding char* to literal

There are other problems with your code: undefined behaviour (UB) in loops with uninitialized variables,

for ( int i ; i < len ; i++) // using uninitialized i: UB

You should also have a look at toupper and tolower documentation. They both accept int with some restrictions on their values. You have to ensure you don't pass a value that causes undefined behaviour, bearing in mind that char can be signed. See for example Do I need to cast to unsigned char before calling toupper?

What's an alternate way of getting the char's int value to increment?

Though in C string literals have types of non-constant character arrays nevertheless you may not change string literals.

char *hello = "Hello, world!";

From the C Standard (6.4.5 String literals)

7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

So you should declare the identifier hello like a character array

char hello[] = "Hello, world!";

Within the function you should not use magic numbers like 32. For example if the compiler uses the EBCDIC coding your function will produce a wrong result.

And in the loop instead of the type int you have to use the type size_t because an object of the type int can be unable to store all values of the type size_t that is the return type of the sizeof operator or of the function strlen.

This statement

(int) text[i] += 32;

does not make a sense because in the left side of the expression there is an rvalue due to the casting.

The function can be implemented the following way

char * lowercase( char *text ) 
{
for ( char *p = text; *p; ++p )
{
if ( isalpha( ( unsigned char )*p ) )
{
*p = tolower( ( unsigned char )*p );
}
}

return text;
}


Related Topics



Leave a reply



Submit