What Does the "::" Mean in "::Tolower"

What does the :: mean in ::tolower?

Means that it is explicitly using the tolower in the global namespace (which is presumably the stdc lib one).

Example:

void foo() {
// This is your global foo
}

namespace bar {
void foo() {
// This is bar's foo
}
}

using namespace bar;

void test() {
foo(); // Ambiguous - which one is it?
::foo(); // This is the global foo()
}

toupper tolower

toupper and tolower are defined in ctype.h. Simply include this file with the line #include <ctype.h>.

std::tolower and Visual Studio 2013

First off, note, that none of these approaches does the right thing in a portable way! The problem is that char may be signed (and typically is) but the versions of tolower() only accept positive values! That is you really want to use std::tolower() using something like this:

std::transform(test.begin(), test.end(), test.begin(),
[](unsigned char c) { return std::tolower(c); });

(or, of course, using a corresponding function object if you are stuck with C++03). Using std::tolower() (or ::tolower() for that matter) with a negative value results in undefined behavior. Of course, this only matters on platform where char is signed which seems, however, to be the typical choice.

To answer your questions:

  1. When including <cctype> you typically get the various functions and types from the standard C library both in namespace std as well as in the global namespace. Thus, using ::tolower normally works but isn't guaranteed to work.
  2. When including <locale>, there are two versions of std::tolower available, one as int(*)(int) and one as char(*)(char, std::locale const&). When using just std::tolower the compiler has generally no way to decide which one to use.
  3. Since std::tolower is ambiguous, using static_cast<int(*)(int)>(std::tolower) disambiguates which version to use. Why use of static_cast<...>() with VC++ fails, I don't know.
  4. You shouldn't use std::tolower() with a sequences of chars anyway as it will result in undefined behavior. Use a function object using std::tolower internally on an unsigned char.

It is worth noting that using a function object rather than a function pointer is typically a lot faster because it is trivial to inline the function object but not as trivial to inline the function pointer. Compilers are getting better with inlining the use of function pointers where the function is actually known but contemporary compilers certainly don't always inline function calls through function pointers even if all the context would be there.

How does the behavior of std::tolower change in different locales?

Actually, the very example on the site shows a difference:

#include <iostream>
#include <cctype>
#include <clocale>

int main()
{
unsigned char c = '\xb4'; // the character Ž in ISO-8859-15
// but ´ (acute accent) in ISO-8859-1

std::setlocale(LC_ALL, "en_US.iso88591");
std::cout << std::hex << std::showbase;
std::cout << "in iso8859-1, tolower('0xb4') gives "
<< std::tolower(c) << '\n';
std::setlocale(LC_ALL, "en_US.iso885915");
std::cout << "in iso8859-15, tolower('0xb4') gives "
<< std::tolower(c) << '\n';
}

Output:

in iso8859-1, tolower('0xb4') gives 0xb4
in iso8859-15, tolower('0xb4') gives 0xb8

Because the C language has no notion of encoding, a char (and thus a char const*) are just bytes. When switching locale, you switch the interpretation of those bytes, for example here the byte 0xb4 (180) is outside the ASCII range (0-127), and therefore its meaning changes depending on the locale you switch to:

  • in ISO-8859-1, it means ´, and therefore is unchanged when moving from upper to lower
  • in ISO-8859-15, it means Ž, and therefore changes to ž (0xb8 in this locale) when moving from upper to lower

You would think that in a post-Unicode world, this would be irrelevant, but many have not yet transitioned to Unicode...

Which tolower in C++?

It should be noted that the language designers were aware of cctype's tolower when locale's tolower was created. It improved in 2 primary ways:

  1. As is mentioned in progressive_overload's answer the locale version allowed the use of the facet ctype, even a user modified one, without requiring the shuffling in of a new LC_CTYPE in via setlocale and the restoration of the previous LC_CTYPE
  2. From section 7.1.6.2[dcl.type.simple]3:

It is implementation-defined whether objects of char type are represented as signed or unsigned quantities. The signed specifier forces char objects to be signed

Which creates an the potential for undefined behavior with the cctype version of tolower's if it's argument:

Is not representable as unsigned char and does not equal EOF

So there is an additional input and output static_cast required by the cctype version of tolower yielding:

transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });

Since the locale version operates directly on chars there is no need for a type conversion.

So if you don't need to perform the conversion in a different facet ctype it simply becomes a style question of whether you prefer the transform with a lambda required by the cctype version, or whether you prefer the locale version's:

use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));

What's going on at tolower?

You can see this answered in my answer, that's a function pointer. You can read more about them here: http://en.cppreference.com/w/cpp/language/pointer#Pointers_to_functions

Fundamentally this is a pointer to a function that takes in an int argument and returns an int.


The reason the transform works when using myToLower and not with an uncast tolower, is that in code is that the tolower function is overloaded in the std namespace by both the locale library's tolower and the ctype library's tolower. When only the function name is used as an uncast pointer no overload resolution is performed, and you'll get an error. When you cast the function pointer you're telling the compiler which overload you want.

Why do I get a number when I stream the result of tolower() to std::cout, but not when I use putchar()?

std::tolower() and std::toupper() functions return value is int. And std::cout print the exact value that appears to it.

So cout<< tolower(s[i])<<endl print ASCII value of the characters. But when you write putchar(toupper(s[i])) then putchar() function automatically converts the ASCII value to characters. That's why you get character as output.

In order to use cout<< tolower(s[i]) << endl you need to typecast the ASCII value to character.

So write -

cout<< (char)(toupper[i]) <<endl;
cout<< (char)(tolower[i]) <<endl;

How to convert a string to lower case in Bash

The are various ways:

POSIX standard

tr

$ echo "$a" | tr '[:upper:]' '[:lower:]'
hi all

AWK

$ echo "$a" | awk '{print tolower($0)}'
hi all

Non-POSIX

You may run into portability issues with the following examples:

Bash 4.0

$ echo "${a,,}"
hi all

sed

$ echo "$a" | sed -e 's/\(.*\)/\L\1/'
hi all
# this also works:
$ sed -e 's/\(.*\)/\L\1/' <<< "$a"
hi all

Perl

$ echo "$a" | perl -ne 'print lc'
hi all

Bash

lc(){
case "$1" in
[A-Z])
n=$(printf "%d" "'$1")
n=$((n+32))
printf \\$(printf "%o" "$n")
;;
*)
printf "%s" "$1"
;;
esac
}
word="I Love Bash"
for((i=0;i<${#word};i++))
do
ch="${word:$i:1}"
lc "$ch"
done

Note: YMMV on this one. Doesn't work for me (GNU bash version 4.2.46 and 4.0.33 (and same behaviour 2.05b.0 but nocasematch is not implemented)) even with using shopt -u nocasematch;. Unsetting that nocasematch causes [[ "fooBaR" == "FOObar" ]] to match OK BUT inside case weirdly [b-z] are incorrectly matched by [A-Z]. Bash is confused by the double-negative ("unsetting nocasematch")! :-)

How to convert an instance of std::string to lower case

Adapted from Not So Frequently Asked Questions:

#include <algorithm>
#include <cctype>
#include <string>

std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
[](unsigned char c){ return std::tolower(c); });

You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.

If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:

char asciitolower(char in) {
if (in <= 'Z' && in >= 'A')
return in - ('Z' - 'z');
return in;
}

std::transform(data.begin(), data.end(), data.begin(), asciitolower);

Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.



Related Topics



Leave a reply



Submit