What makes more sense - char* string or char *string?
In the following declaration:
char* string1, string2;
string1
is a character pointer, but string2
is a single character only. For this reason, the declaration is usually formatted like:
char *string1, string2;
which makes it slightly clearer that the *
applies to string1
but not string2
. Good practice is to avoid declaring multiple variables in one declaration, especially if some of them are pointers.
Difference between string and char[] types in C++
A char array is just that - an array of characters:
- If allocated on the stack (like in your example), it will always occupy eg. 256 bytes no matter how long the text it contains is
- If allocated on the heap (using malloc() or new char[]) you're responsible for releasing the memory afterwards and you will always have the overhead of a heap allocation.
- If you copy a text of more than 256 chars into the array, it might crash, produce ugly assertion messages or cause unexplainable (mis-)behavior somewhere else in your program.
- To determine the text's length, the array has to be scanned, character by character, for a \0 character.
A string is a class that contains a char array, but automatically manages it for you. Most string implementations have a built-in array of 16 characters (so short strings don't fragment the heap) and use the heap for longer strings.
You can access a string's char array like this:
std::string myString = "Hello World";
const char *myStringChars = myString.c_str();
C++ strings can contain embedded \0 characters, know their length without counting, are faster than heap-allocated char arrays for short texts and protect you from buffer overruns. Plus they're more readable and easier to use.
However, C++ strings are not (very) suitable for usage across DLL boundaries, because this would require any user of such a DLL function to make sure he's using the exact same compiler and C++ runtime implementation, lest he risk his string class behaving differently.
Normally, a string class would also release its heap memory on the calling heap, so it will only be able to free memory again if you're using a shared (.dll or .so) version of the runtime.
In short: use C++ strings in all your internal functions and methods. If you ever write a .dll or .so, use C strings in your public (dll/so-exposed) functions.
How much performance difference when using string vs char array?
Let's run the numbers:
2022 edit:
Using Quick-Bench with GCC 10.3 and compiling with C++20 (with some minor changes for constness) demonstrates that std::string
is now faster, almost 3x as much:
Original answer (2014)
The code (I used PAPI Timers)
main.cpp
#include <iostream>
#include <string>
#include <stdio.h>
#include "papi.h"
#include <vector>
#include <cmath>
#define TRIALS 10000000
class Clock
{
public:
typedef long_long time;
time start;
Clock() : start(now()){}
void restart(){ start = now(); }
time usec() const{ return now() - start; }
time now() const{ return PAPI_get_real_usec(); }
};
int main()
{
int eventSet = PAPI_NULL;
PAPI_library_init(PAPI_VER_CURRENT);
if(PAPI_create_eventset(&eventSet)!=PAPI_OK)
{
std::cerr << "Failed to initialize PAPI event" << std::endl;
return 1;
}
Clock clock;
std::vector<long_long> usecs;
const char* baseLocation = "baseLocation";
//std::string baseLocation = "baseLocation";
char fname[255] = {};
for (int i=0;i<TRIALS;++i)
{
clock.restart();
snprintf(fname, 255, "%s_test_no.%d.txt", baseLocation, i);
//std::string fname = baseLocation + "_test_no." + std::to_string(i) + ".txt";
usecs.push_back(clock.usec());
}
long_long sum = 0;
for(auto vecIter = usecs.begin(); vecIter != usecs.end(); ++vecIter)
{
sum+= *vecIter;
}
double average = static_cast<double>(sum)/static_cast<double>(TRIALS);
std::cout << "Average: " << average << " microseconds" << std::endl;
//compute variance
double variance = 0;
for(auto vecIter = usecs.begin(); vecIter != usecs.end(); ++vecIter)
{
variance += (*vecIter - average) * (*vecIter - average);
}
variance /= static_cast<double>(TRIALS);
std::cout << "Variance: " << variance << " microseconds" << std::endl;
std::cout << "Std. deviation: " << sqrt(variance) << " microseconds" << std::endl;
double CI = 1.96 * sqrt(variance)/sqrt(static_cast<double>(TRIALS));
std::cout << "95% CI: " << average-CI << " usecs to " << average+CI << " usecs" << std::endl;
}
Play with the comments to get one way or the other.
10 million iterations of both methods on my machine with the compile line:
g++ main.cpp -lpapi -DUSE_PAPI -std=c++0x -O3
Using char array:
Average: 0.240861 microseconds
Variance: 0.196387microseconds
Std. deviation: 0.443156 microseconds
95% CI: 0.240586 usecs to 0.241136 usecs
Using string approach:
Average: 0.365933 microseconds
Variance: 0.323581 microseconds
Std. deviation: 0.568842 microseconds
95% CI: 0.365581 usecs to 0.366286 usecs
So at least on MY machine with MY code and MY compiler settings, I saw about a 50% slowdown when moving to strings. that character arrays incur a 34% speedup over strings using the following formula:
((time for string) - (time for char array) ) / (time for string)
Which gives the difference in time between the approaches as a percentage on time for string alone. My original percentage was correct; I used the character array approach as a reference point instead, which shows a 52% slowdown when moving to string, but I found it misleading.
I'll take any and all comments for how I did this wrong :)
2015 Edit
Compiled with GCC 4.8.4:
string
Average: 0.338876 microseconds
Variance: 0.853823 microseconds
Std. deviation: 0.924026 microseconds
95% CI: 0.338303 usecs to 0.339449 usecs
character array
Average: 0.239083 microseconds
Variance: 0.193538 microseconds
Std. deviation: 0.439929 microseconds
95% CI: 0.238811 usecs to 0.239356 usecs
So the character array approach remains significantly faster although less so. In these tests, it was about 29% faster.
For a guaranteed single character, is it better to use `char` or `string`?
It is probably better to use a char
in this case, assuming you want to store it and process it often latter, otherwise you can just directly access the string
individual char
s using operator[]
. One thing to note is that std::string
implements the so-called short string optimization, which should be quite fast. But anyway, you should profile your code, and unless you need a std::string
(e.g. to be passed around latter in some other functions), you should just use a char
.
Why do strings use char*?
Jim Balter notes in a comment that
The instructions on the PDP-11 dealing with bytes treated them as signed quantities, so that's how the early C compilers treated them, and unsigned didn't even exist.
I strongly suspect that this is the answer to why the default character type char
isn’t required to be unsigned, but one would need a quote from some written historical account in order to be sure.
As to why it isn’t required to be signed either (!), on a non-two's complement machine such as (the only one I know that's possibly still in use) a Clearpath Dorado, a signed char
cannot hold all values of an unsigned char
, since it's wasting one bitpattern on a negative zero, or whatever that bitpattern is put to use for. If char
were required to be signed then this would be a problem for reinterpreting general data as a sequence of char
value. Consequently, on such a machine char
has to be unsigned, or else the software will have to be engaging in extreme contortions to deal with it.
which one is preferable to store characters, vector char or string?
I would suggest using std::string
if you are actually working with strings.
This makes more sense to me from a semantic perspective than using std::vector<char>
...
Note also that std::string
implements an efficient SSO (small string optimization), that avoids expensive heap allocations for small strings. This optimization is not available with vector<char>
.
In addition, note that std::string
supports also embedded NULs (so you can even store sequences of sub-strings efficiently in cache-friendly contiguous memory in a single std::string
object, if that makes sense for your particular context).
Related Topics
What Are the Rules For Automatic Generation of Move Operations
How to Printf Uint64_T? Fails With: "Spurious Trailing '%' in Format"
Explicit Specialization of Template Class Member Function
How to Find the Address of a Reference
Non-Blocking Worker - Interrupt File Copy
Macro For Dllexport/Dllimport Switch
Standard Library Sort and User Defined Types
When Passing an Array to a Function in C++, Why Won't Sizeof() Work the Same as in the Main Function
How to Get the Error Message from the Error Code Returned by Getlasterror()
What Are the Best (Portable) Cross-Platform Arbitrary-Precision Math Libraries
How to Remove an Item from a Stl Vector With a Certain Value
C++: What Regex Library Should I Use
Pinpointing "Conditional Jump or Move Depends on Uninitialized Value(S)" Valgrind Message
32-Bit to 16-Bit Floating Point Conversion
Error Lnk2005, Already Defined
How to Parse a Date String into a C++11 Std::Chrono Time_Point or Similar