Difference Between String and Char[] Types in C++

Difference between string and char[] types in C++

A char array is just that - an array of characters:

  • If allocated on the stack (like in your example), it will always occupy eg. 256 bytes no matter how long the text it contains is
  • If allocated on the heap (using malloc() or new char[]) you're responsible for releasing the memory afterwards and you will always have the overhead of a heap allocation.
  • If you copy a text of more than 256 chars into the array, it might crash, produce ugly assertion messages or cause unexplainable (mis-)behavior somewhere else in your program.
  • To determine the text's length, the array has to be scanned, character by character, for a \0 character.

A string is a class that contains a char array, but automatically manages it for you. Most string implementations have a built-in array of 16 characters (so short strings don't fragment the heap) and use the heap for longer strings.

You can access a string's char array like this:

std::string myString = "Hello World";
const char *myStringChars = myString.c_str();

C++ strings can contain embedded \0 characters, know their length without counting, are faster than heap-allocated char arrays for short texts and protect you from buffer overruns. Plus they're more readable and easier to use.


However, C++ strings are not (very) suitable for usage across DLL boundaries, because this would require any user of such a DLL function to make sure he's using the exact same compiler and C++ runtime implementation, lest he risk his string class behaving differently.

Normally, a string class would also release its heap memory on the calling heap, so it will only be able to free memory again if you're using a shared (.dll or .so) version of the runtime.

In short: use C++ strings in all your internal functions and methods. If you ever write a .dll or .so, use C strings in your public (dll/so-exposed) functions.

Difference between char[] and strings in C

C is [relatively] low-level statically typed programming language.

char c = 'c';
const char* s = "s";

The statements above differ not only in the value of the literal constant (c: single byte storage; s: two bytes storage + 4/8 byte pointer), but also in the type of variables (c: single byte, certain arithmetic ops; s: 4/8 byte pointer, different arithmetic).

I posit to you that the latter difference is more important; Literal constants are there to make use of variables, function arguments, struct members, etc easier.

Furthermore, the typical problems solved in C are of low-level nature where you are interested in logical difference between single character and a string. For example gpio, serial port, substring search algorithm.

[Of course C is also used in other domains, you are not likely to see much character vs string distinction in higher-level projects like glib or enlightenment.]

Python is a high-level dynamic language.

c = 'c'
s = "s"

In the statements above locals/labels c and s point to objects and type is determined at runtime, dynamically. Thus a distinction between a "character" and "string" is simply not needed.

Problems solved in Python are usually of much higher level, typically you'd deal with JSON blobs, HTTP requests, database queries, virtual machines, etc; Even if you need to deal with single characters, length-1 string is an acceptable approximation.

[If you used numpy or cffi, you would worry about storage of characters and strings and those modules provide mechanism to do so.]

Differences between int/char arrays/strings

To elaborate on WhozCraig's answer, the trouble you are having does not have to do with strings, but with the individual characters.

Strings in C are stored by and large as arrays of characters (with the caveat that there exists a null terminator at the end).

The characters themselves are encoded in a system called ascii which assigns codes between 0 - 127 for characters used in the english language (only). Thus "7" is not stored as 7 but as the ascii encoding of 7 which is 55.

I think now you can see why your product got so large.

One elegant way to fix would be to convert

int num = (int) str[n];

to

int num = str[n] - '0';  
//thanks for fixing, ' ' is used for characters, " " is used for strings

This solution subtracts the ascii code for 0 from the ascii code for your character, say "7". Since the numbers are encoded linearly, this will work (for single digit numbers). For larger numbers, you should use atoi or strtol from stdlib.h

differences between char * and string

Assuming you're referring to std::string, string is a standard library class modelling a string.

char* is just a pointer to a single char. In C and C++, various functions exist that will take a pointer to a single char as a parameter and will track along the memory until a 0 memory value is reached (often called the null terminator). In that way it models a string of characters; strlen is an example of a function (from the C standard library) that does this.

If you have a choice, use std::string as you don't have to concern yourself with memory.

What are the differences between char* and all other types of pointers in C?

Let's go through each of your questions one by one.

Printing

In your first code snippet, you show that printf is capable of printing strings. Of course it printed a string. You gave it a %s which is meant to be a string. But first, what is a string? To explain that, we need to understand arrays and chars.

What's a String?

First, what's a char? A char is a singular character (or an 8-bit number, but for our purposes it's a character). A character can be a letter (a, b, c)m or any other symbol (?, !, ., numbers, there are also some control characters). Typically, if you only needed one character, you'd declare it like this:

char letter_a = 'a';

So what is an array? An array is a group of values all next to each other. Consider the following code:

int int_array[] = int[50];
int_array[0] = 1;
int_array[1] = 2;
...

In this example, what is int_array? The answer seems obvious. It's an array. But there's more to it than thar. What if we do this?

printf("%d\n", *int_array);

It prints out 1. Why? Because int_array is actually just a pointer to the first element of the array.

So why am I talking about arrays? Because a string is just an array of characters. When you run char* string = "Hello!", you just create an array that looks like this: ['H', 'e', 'l', 'l', 'o', '!', '\0']. C knows that the string has ended once it reaches the null symbol ('\0').

In your first snippet, var is a pointer to the letter 'H', and the print statement keeps printing characters until it reaches null.

What about the second snippet?

%d doesn't dereference the variable like %s does. It just prints a number as a signed integer. The integer in this case is the memory address of your integer.

Why cant you assign pointers?

You can. You'll get a warning, and it will probably cause a segmentation fault, but you can try it. I compiled your code example using clang and this is what I got:

test.c:1:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main() {
^
test.c:1:1: note: change return type to 'int'
void main() {
^~~~
int
test.c:2:7: warning: incompatible integer to pointer conversion initializing 'int *' with an expression of type 'int' [-Wint-conversion]
int* var = 5;
^ ~
2 warnings generated.

I will not dare try to run it though. Basically what you just did is try to access the fifth location in memory, which is most likely some OS stuff. You don't have access to that

Why did it work for the string?

Because it doesn't point to a specific location. It points to the location of the string that C made for you. Your code is roughly equivalent to this:

char h = 'H';
char e = 'e';
...
char* var = &h;

What is the difference between char stringA[LEN] and char* stringB[LEN] in C

Are they any different?

Yes.

Both variables stringA and stringB are arrays. stringA is an array of char of size LEN and stringB is an array of char * of size LEN.

char and char * are two different types. stringA can hold only one character string of length LEN while elements of stingB can point to LEN number of strings.

Or does stringB again becomes immutable as in the case before?

Whether strings pointed by elements of stringB is mutable or not will depend on how memory is allocated. If they are initialized with string literals

char* stringB[LEN] = { "Apple", "Bapple", "Capple"};  

then they are immutable. In case of

for(int i = 0; i < LEN; i++)
stringB[i] = malloc(30) // Allocating 30 bytes for each element

strcpy(stringB[0], "Apple");
strcpy(stringB[1], "Bapple");
strcpy(stringB[2], "Capple");

they are mutable.

What is the difference between char s[] and char *s?

The difference here is that

char *s = "Hello world";

will place "Hello world" in the read-only parts of the memory, and making s a pointer to that makes any writing operation on this memory illegal.

While doing:

char s[] = "Hello world";

puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making

s[0] = 'J';

legal.



Related Topics



Leave a reply



Submit