C-string definition in C / C++
A "C string" is an array of characters that ends with a 0 (null character) byte. The array, not any pointer, is the string. Thus, any terminal subarray of a C string is also a C string. Pointers of type char *
(or const char *
, etc.) are often thought of as pointers to strings, but they're actually pointers to an element of a string, usually treated as a pointer to the initial element of a string.
C string type definition
You can't make variables of type string in C, because "string" is not a type.
A "string" is, by definition, "a contiguous sequence of characters terminated by and including the first null character". It's not a data type, it's a data format.
An array of char
may contain a string. A char*
may point to a string. Neither of them is a string.
If you like, you can define
typedef char *string; /* not recommended */
but that's misleading, since a variable of type char*
, as I mentioned, isn't a string.
The best practice is simply to use char*
directly. This makes it clear that your variable is a pointer. It's also consistent with the way the standard library is defined; for example, the strlen
function is declared as:
size_t strlen(const char *s);
It's also consistent with the way most experienced C programmers write code that deals with strings.
Because of the way C treats arrays (more or less as second-class citizens), arrays, including arrays that contain strings, are usually manipulated via pointers to their elements. We can use pointer arithmetic to traverse an array. Pretending that the pointer is the array, or that it is a string, is tempting, and might seem to make the code more understandable, but in the long run it just causes confusion.
A macro approach like
#define string char*
is even worse than a typedef
. Macros are expanded as sequences of tokens; the processor doesn't know about the syntax of C declarations. So given the above definition, this:
string x, y;
expands to
char* x, y;
which defines x
as a char*
and y
as a char
. If you need a name for a type, typedef
is almost always better than #define
.
What are the specifics of the definition of a string in C?
c1
is mostly [1] equivalent to &c1[0]
, which is holding one string, "CS"
.
There's a second string lurking in there, "324"
, starting at &c1[3]
-- but as long as you access c1
as c1
, the string "CS"
is all the functions strcpy()
et al. would see.
[1]: c1
is an array, &c1[0]
is a pointer.
Does C have a string type?
C does not and never has had a native string type. By convention, the language uses arrays of char
terminated with a null char, i.e., with '\0'
. Functions and macros in the language's standard libraries provide support for the null-terminated character arrays, e.g., strlen iterates over an array of char
until it encounters a '\0'
character and strcpy copies from the source string until it encounters a '\0'
.
The use of null-terminated strings in C reflects the fact that C was intended to be only a little more high-level than assembly language. Zero-terminated strings were already directly supported at that time in assembly language for the PDP-10 and PDP-11.
It is worth noting that this property of C strings leads to quite a few nasty buffer overrun bugs, including serious security flaws. For example, if you forget to null-terminate a character string passed as the source argument to strcpy
, the function will keep copying sequential bytes from whatever happens to be in memory past the end of the source string until it happens to encounter a 0
, potentially overwriting whatever valuable information follows the destination string's location in memory.
In your code example, the string literal "Hello, world!" will be compiled into a 14-byte long array of char
. The first 13 bytes will hold the letters, comma, space, and exclamation mark and the final byte will hold the null-terminator character '\0'
, automatically added for you by the compiler. If you were to access the array's last element, you would find it equal to 0
. E.g.:
const char foo[] = "Hello, world!";
assert(foo[12] == '!');
assert(foo[13] == '\0');
However, in your example, message
is only 10 bytes long. strcpy
is going to write all 14 bytes, including the null-terminator, into memory starting at the address of message
. The first 10 bytes will be written into the memory allocated on the stack for message
and the remaining four bytes will simply be written on to the end of the stack. The consequence of writing those four extra bytes onto the stack is hard to predict in this case (in this simple example, it might not hurt a thing), but in real-world code it usually leads to corrupted data or memory access violation errors.
Understanding C-strings & string literals in C++
In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr
variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p
is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p
pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr
array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr
variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof
operator: this one computes the size of the whole array, not just a pointer to the first element.)
Define where placed the c-string
You cannot tell if a pointer is on the automatic store, in static memory, or in dynamic memory simply by looking at the pointer. You need to store a flag at the time when you set that pointer - for example, like this:
class classWithDynamicData {
private:
bool needToDelete;
char strData[];
public:
classWithDynamicData(int size) : needToDelete(true), strData(new char[size]) {
}
classWithDynamicData(char* data) : needToDelete(false), strData(data) {
}
~classWithDynamicData() {
if (needToDelete) delete[] strData;
}
...
// You need to define a copy constructor and an assignment operator
// to avoid violating the rule of three
};
How do you declare string constants in C?
There's one more (at least) road to Rome:
static const char HELLO3[] = "Howdy";
(static
— optional — is to prevent it from conflicting with other files). I'd prefer this one over const char*
, because then you'll be able to use sizeof(HELLO3)
and therefore you don't have to postpone till runtime what you can do at compile time.
The define has an advantage of compile-time concatenation, though (think HELLO ", World!"
) and you can sizeof(HELLO)
as well.
But then you can also prefer const char*
and use it across multiple files, which would save you a morsel of memory.
In short — it depends.
Related Topics
Where Are the Man Pages for C++
Define Constant Variables in C++ Header
Difference Between C++11 Std::Bind and Boost::Bind
Why Can't Visual Studio Find My Dll
Inserting into a Vector at the Front
Why Does a C++ Friend Class Need a Forward Declaration Only in Other Namespaces
Throw New Std::Exception VS Throw Std::Exception
Is Python Faster and Lighter Than C++
Differencebetween #Define and Const
Embedded C++:To Use Stl or Not
What's the Differences Between .Dll , .Lib, .H Files
How to Support Both Ipv4 and Ipv6 Connections
C++ Std::Accumulate Doesn't Give the Expected Sum
What's the Difference Between Opening a File with iOS::Binary or iOS::Out or Both