What does the :: mean in ::tolower?
Means that it is explicitly using the tolower
in the global namespace (which is presumably the stdc lib one).
Example:
void foo() {
// This is your global foo
}
namespace bar {
void foo() {
// This is bar's foo
}
}
using namespace bar;
void test() {
foo(); // Ambiguous - which one is it?
::foo(); // This is the global foo()
}
toupper tolower
toupper
and tolower
are defined in ctype.h
. Simply include this file with the line #include <ctype.h>
.
std::tolower and Visual Studio 2013
First off, note, that none of these approaches does the right thing in a portable way! The problem is that char
may be signed (and typically is) but the versions of tolower()
only accept positive values! That is you really want to use std::tolower()
using something like this:
std::transform(test.begin(), test.end(), test.begin(),
[](unsigned char c) { return std::tolower(c); });
(or, of course, using a corresponding function object if you are stuck with C++03). Using std::tolower()
(or ::tolower()
for that matter) with a negative value results in undefined behavior. Of course, this only matters on platform where char
is signed which seems, however, to be the typical choice.
To answer your questions:
- When including
<cctype>
you typically get the various functions and types from the standard C library both in namespacestd
as well as in the global namespace. Thus, using::tolower
normally works but isn't guaranteed to work. - When including
<locale>
, there are two versions ofstd::tolower
available, one asint(*)(int)
and one aschar(*)(char, std::locale const&)
. When using juststd::tolower
the compiler has generally no way to decide which one to use. - Since
std::tolower
is ambiguous, usingstatic_cast<int(*)(int)>(std::tolower)
disambiguates which version to use. Why use ofstatic_cast<...>()
with VC++ fails, I don't know. - You shouldn't use
std::tolower()
with a sequences ofchar
s anyway as it will result in undefined behavior. Use a function object usingstd::tolower
internally on anunsigned char
.
It is worth noting that using a function object rather than a function pointer is typically a lot faster because it is trivial to inline the function object but not as trivial to inline the function pointer. Compilers are getting better with inlining the use of function pointers where the function is actually known but contemporary compilers certainly don't always inline function calls through function pointers even if all the context would be there.
How does the behavior of std::tolower change in different locales?
Actually, the very example on the site shows a difference:
#include <iostream>
#include <cctype>
#include <clocale>
int main()
{
unsigned char c = '\xb4'; // the character Ž in ISO-8859-15
// but ´ (acute accent) in ISO-8859-1
std::setlocale(LC_ALL, "en_US.iso88591");
std::cout << std::hex << std::showbase;
std::cout << "in iso8859-1, tolower('0xb4') gives "
<< std::tolower(c) << '\n';
std::setlocale(LC_ALL, "en_US.iso885915");
std::cout << "in iso8859-15, tolower('0xb4') gives "
<< std::tolower(c) << '\n';
}
Output:
in iso8859-1, tolower('0xb4') gives 0xb4
in iso8859-15, tolower('0xb4') gives 0xb8
Because the C language has no notion of encoding, a char
(and thus a char const*
) are just bytes. When switching locale, you switch the interpretation of those bytes, for example here the byte 0xb4
(180) is outside the ASCII range (0-127), and therefore its meaning changes depending on the locale you switch to:
- in ISO-8859-1, it means
´
, and therefore is unchanged when moving from upper to lower - in ISO-8859-15, it means
Ž
, and therefore changes tož
(0xb8 in this locale) when moving from upper to lower
You would think that in a post-Unicode world, this would be irrelevant, but many have not yet transitioned to Unicode...
Which tolower in C++?
It should be noted that the language designers were aware of cctype
's tolower
when locale
's tolower
was created. It improved in 2 primary ways:
- As is mentioned in progressive_overload's answer the
locale
version allowed the use of thefacet ctype
, even a user modified one, without requiring the shuffling in of a newLC_CTYPE
in viasetlocale
and the restoration of the previousLC_CTYPE
- From section 7.1.6.2[dcl.type.simple]3:
It is implementation-defined whether objects of
char
type are represented as signed or unsigned quantities. Thesigned
specifier forceschar
objects to be signed
Which creates an the potential for undefined behavior with the cctype
version of tolower
's if it's argument:
Is not representable as
unsigned char
and does not equalEOF
So there is an additional input and output static_cast
required by the cctype
version of tolower
yielding:
transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });
Since the locale
version operates directly on char
s there is no need for a type conversion.
So if you don't need to perform the conversion in a different facet ctype
it simply becomes a style question of whether you prefer the transform
with a lambda required by the cctype
version, or whether you prefer the locale
version's:
use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));
What's going on at tolower?
You can see this answered in my answer, that's a function pointer. You can read more about them here: http://en.cppreference.com/w/cpp/language/pointer#Pointers_to_functions
Fundamentally this is a pointer to a function that takes in an int
argument and returns an int
.
The reason the transform
works when using myToLower
and not with an uncast tolower
, is that in code is that the tolower
function is overloaded in the std
namespace by both the locale library's tolower
and the ctype library's tolower
. When only the function name is used as an uncast pointer no overload resolution is performed, and you'll get an error. When you cast the function pointer you're telling the compiler which overload you want.
Why do I get a number when I stream the result of tolower() to std::cout, but not when I use putchar()?
std::tolower()
and std::toupper()
functions return value is int. And std::cout
print the exact value that appears to it.
So cout<< tolower(s[i])<<endl
print ASCII value of the characters. But when you write putchar(toupper(s[i]))
then putchar()
function automatically converts the ASCII value to characters. That's why you get character as output.
In order to use cout<< tolower(s[i]) << endl
you need to typecast the ASCII value to character.
So write -
cout<< (char)(toupper[i]) <<endl;
cout<< (char)(tolower[i]) <<endl;
How to convert a string to lower case in Bash
The are various ways:
POSIX standard
tr
$ echo "$a" | tr '[:upper:]' '[:lower:]'
hi all
AWK
$ echo "$a" | awk '{print tolower($0)}'
hi all
Non-POSIX
You may run into portability issues with the following examples:
Bash 4.0
$ echo "${a,,}"
hi all
sed
$ echo "$a" | sed -e 's/\(.*\)/\L\1/'
hi all
# this also works:
$ sed -e 's/\(.*\)/\L\1/' <<< "$a"
hi all
Perl
$ echo "$a" | perl -ne 'print lc'
hi all
Bash
lc(){
case "$1" in
[A-Z])
n=$(printf "%d" "'$1")
n=$((n+32))
printf \\$(printf "%o" "$n")
;;
*)
printf "%s" "$1"
;;
esac
}
word="I Love Bash"
for((i=0;i<${#word};i++))
do
ch="${word:$i:1}"
lc "$ch"
done
Note: YMMV on this one. Doesn't work for me (GNU bash version 4.2.46 and 4.0.33 (and same behaviour 2.05b.0 but nocasematch is not implemented)) even with using shopt -u nocasematch;
. Unsetting that nocasematch causes [[ "fooBaR" == "FOObar" ]] to match OK BUT inside case weirdly [b-z] are incorrectly matched by [A-Z]. Bash is confused by the double-negative ("unsetting nocasematch")! :-)
How to convert an instance of std::string to lower case
Adapted from Not So Frequently Asked Questions:
#include <algorithm>
#include <cctype>
#include <string>
std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
[](unsigned char c){ return std::tolower(c); });
You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.
If you really hate tolower()
, here's a specialized ASCII-only alternative that I don't recommend you use:
char asciitolower(char in) {
if (in <= 'Z' && in >= 'A')
return in - ('Z' - 'z');
return in;
}
std::transform(data.begin(), data.end(), data.begin(), asciitolower);
Be aware that tolower()
can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.
Related Topics
Should I Still Return Const Objects in C++11
C++ Virtual Function Table Memory Cost
Boost::Asio with Boost::Unique_Future
Undefined Symbols for Architecture X86_64: Which Architecture Should I Use
How to Detect Text Area from Image
Why There Are Three Unexpected Worker Threads When a Win32 Console Application Starts Up
Opencv Cv::Mat to Std::Ifstream for Base64 Encoding
"Volatile" Qualifier and Compiler Reorderings
#Error Please Use the /Md Switch for _Afxdll Builds
Why Can't I Initialize a Variable-Sized Array
Get Key Press in Windows Console
Redirect Both Cout and Stdout to a String in C++ for Unit Testing
What Are Practical Applications of Weak Linking
How to Determine If Returned Pointer Is on the Stack or Heap