C-String Definition in C/C++

C-string definition in C / C++

A "C string" is an array of characters that ends with a 0 (null character) byte. The array, not any pointer, is the string. Thus, any terminal subarray of a C string is also a C string. Pointers of type char * (or const char *, etc.) are often thought of as pointers to strings, but they're actually pointers to an element of a string, usually treated as a pointer to the initial element of a string.

C string type definition

You can't make variables of type string in C, because "string" is not a type.

A "string" is, by definition, "a contiguous sequence of characters terminated by and including the first null character". It's not a data type, it's a data format.

An array of char may contain a string. A char* may point to a string. Neither of them is a string.

If you like, you can define

typedef char *string; /* not recommended */

but that's misleading, since a variable of type char*, as I mentioned, isn't a string.

The best practice is simply to use char* directly. This makes it clear that your variable is a pointer. It's also consistent with the way the standard library is defined; for example, the strlen function is declared as:

size_t strlen(const char *s);

It's also consistent with the way most experienced C programmers write code that deals with strings.

Because of the way C treats arrays (more or less as second-class citizens), arrays, including arrays that contain strings, are usually manipulated via pointers to their elements. We can use pointer arithmetic to traverse an array. Pretending that the pointer is the array, or that it is a string, is tempting, and might seem to make the code more understandable, but in the long run it just causes confusion.

A macro approach like

#define string char*

is even worse than a typedef. Macros are expanded as sequences of tokens; the processor doesn't know about the syntax of C declarations. So given the above definition, this:

string x, y;

expands to

char* x, y;

which defines x as a char* and y as a char. If you need a name for a type, typedef is almost always better than #define.

What are the specifics of the definition of a string in C?

c1 is mostly [1] equivalent to &c1[0], which is holding one string, "CS".

There's a second string lurking in there, "324", starting at &c1[3] -- but as long as you access c1 as c1, the string "CS" is all the functions strcpy() et al. would see.


[1]: c1 is an array, &c1[0] is a pointer.

Does C have a string type?

C does not and never has had a native string type. By convention, the language uses arrays of char terminated with a null char, i.e., with '\0'. Functions and macros in the language's standard libraries provide support for the null-terminated character arrays, e.g., strlen iterates over an array of char until it encounters a '\0' character and strcpy copies from the source string until it encounters a '\0'.

The use of null-terminated strings in C reflects the fact that C was intended to be only a little more high-level than assembly language. Zero-terminated strings were already directly supported at that time in assembly language for the PDP-10 and PDP-11.

It is worth noting that this property of C strings leads to quite a few nasty buffer overrun bugs, including serious security flaws. For example, if you forget to null-terminate a character string passed as the source argument to strcpy, the function will keep copying sequential bytes from whatever happens to be in memory past the end of the source string until it happens to encounter a 0, potentially overwriting whatever valuable information follows the destination string's location in memory.

In your code example, the string literal "Hello, world!" will be compiled into a 14-byte long array of char. The first 13 bytes will hold the letters, comma, space, and exclamation mark and the final byte will hold the null-terminator character '\0', automatically added for you by the compiler. If you were to access the array's last element, you would find it equal to 0. E.g.:

const char foo[] = "Hello, world!";
assert(foo[12] == '!');
assert(foo[13] == '\0');

However, in your example, message is only 10 bytes long. strcpy is going to write all 14 bytes, including the null-terminator, into memory starting at the address of message. The first 10 bytes will be written into the memory allocated on the stack for message and the remaining four bytes will simply be written on to the end of the stack. The consequence of writing those four extra bytes onto the stack is hard to predict in this case (in this simple example, it might not hurt a thing), but in real-world code it usually leads to corrupted data or memory access violation errors.

Understanding C-strings & string literals in C++

In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.

In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.

Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.

(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)

Define where placed the c-string

You cannot tell if a pointer is on the automatic store, in static memory, or in dynamic memory simply by looking at the pointer. You need to store a flag at the time when you set that pointer - for example, like this:

class classWithDynamicData {
private:
bool needToDelete;
char strData[];
public:
classWithDynamicData(int size) : needToDelete(true), strData(new char[size]) {
}
classWithDynamicData(char* data) : needToDelete(false), strData(data) {
}
~classWithDynamicData() {
if (needToDelete) delete[] strData;
}
...
// You need to define a copy constructor and an assignment operator
// to avoid violating the rule of three
};

How do you declare string constants in C?

There's one more (at least) road to Rome:

static const char HELLO3[] = "Howdy";

(static — optional — is to prevent it from conflicting with other files). I'd prefer this one over const char*, because then you'll be able to use sizeof(HELLO3) and therefore you don't have to postpone till runtime what you can do at compile time.

The define has an advantage of compile-time concatenation, though (think HELLO ", World!") and you can sizeof(HELLO) as well.

But then you can also prefer const char* and use it across multiple files, which would save you a morsel of memory.

In short — it depends.



Related Topics



Leave a reply



Submit