Difference between string and char[] types in C++
A char array is just that - an array of characters:
- If allocated on the stack (like in your example), it will always occupy eg. 256 bytes no matter how long the text it contains is
- If allocated on the heap (using malloc() or new char[]) you're responsible for releasing the memory afterwards and you will always have the overhead of a heap allocation.
- If you copy a text of more than 256 chars into the array, it might crash, produce ugly assertion messages or cause unexplainable (mis-)behavior somewhere else in your program.
- To determine the text's length, the array has to be scanned, character by character, for a \0 character.
A string is a class that contains a char array, but automatically manages it for you. Most string implementations have a built-in array of 16 characters (so short strings don't fragment the heap) and use the heap for longer strings.
You can access a string's char array like this:
std::string myString = "Hello World";
const char *myStringChars = myString.c_str();
C++ strings can contain embedded \0 characters, know their length without counting, are faster than heap-allocated char arrays for short texts and protect you from buffer overruns. Plus they're more readable and easier to use.
However, C++ strings are not (very) suitable for usage across DLL boundaries, because this would require any user of such a DLL function to make sure he's using the exact same compiler and C++ runtime implementation, lest he risk his string class behaving differently.
Normally, a string class would also release its heap memory on the calling heap, so it will only be able to free memory again if you're using a shared (.dll or .so) version of the runtime.
In short: use C++ strings in all your internal functions and methods. If you ever write a .dll or .so, use C strings in your public (dll/so-exposed) functions.
Difference between char[] and strings in C
C is [relatively] low-level statically typed programming language.
char c = 'c';
const char* s = "s";
The statements above differ not only in the value of the literal constant (c: single byte storage; s: two bytes storage + 4/8 byte pointer), but also in the type of variables (c: single byte, certain arithmetic ops; s: 4/8 byte pointer, different arithmetic).
I posit to you that the latter difference is more important; Literal constants are there to make use of variables, function arguments, struct members, etc easier.
Furthermore, the typical problems solved in C are of low-level nature where you are interested in logical difference between single character and a string. For example gpio, serial port, substring search algorithm.
[Of course C is also used in other domains, you are not likely to see much character vs string distinction in higher-level projects like glib
or enlightenment
.]
Python is a high-level dynamic language.
c = 'c'
s = "s"
In the statements above locals/labels c
and s
point to objects and type is determined at runtime, dynamically. Thus a distinction between a "character" and "string" is simply not needed.
Problems solved in Python are usually of much higher level, typically you'd deal with JSON blobs, HTTP requests, database queries, virtual machines, etc; Even if you need to deal with single characters, length-1 string is an acceptable approximation.
[If you used numpy
or cffi
, you would worry about storage of characters and strings and those modules provide mechanism to do so.]
Differences between int/char arrays/strings
To elaborate on WhozCraig's answer, the trouble you are having does not have to do with strings, but with the individual characters.
Strings in C are stored by and large as arrays of characters (with the caveat that there exists a null terminator at the end).
The characters themselves are encoded in a system called ascii which assigns codes between 0 - 127 for characters used in the english language (only). Thus "7" is not stored as 7 but as the ascii encoding of 7 which is 55.
I think now you can see why your product got so large.
One elegant way to fix would be to convert
int num = (int) str[n];
to
int num = str[n] - '0';
//thanks for fixing, ' ' is used for characters, " " is used for strings
This solution subtracts the ascii code for 0 from the ascii code for your character, say "7". Since the numbers are encoded linearly, this will work (for single digit numbers). For larger numbers, you should use atoi or strtol from stdlib.h
differences between char * and string
Assuming you're referring to std::string
, string
is a standard library class modelling a string.
char* is just a pointer to a single char. In C and C++, various functions exist that will take a pointer to a single char as a parameter and will track along the memory until a 0 memory value is reached (often called the null terminator). In that way it models a string of characters; strlen
is an example of a function (from the C standard library) that does this.
If you have a choice, use std::string
as you don't have to concern yourself with memory.
What are the differences between char* and all other types of pointers in C?
Let's go through each of your questions one by one.
Printing
In your first code snippet, you show that printf
is capable of printing strings. Of course it printed a string. You gave it a %s
which is meant to be a string. But first, what is a string? To explain that, we need to understand arrays and chars.
What's a String?
First, what's a char
? A char
is a singular character (or an 8-bit number, but for our purposes it's a character). A character can be a letter (a
, b
, c
)m or any other symbol (?
, !
, .
, numbers, there are also some control characters). Typically, if you only needed one character, you'd declare it like this:
char letter_a = 'a';
So what is an array? An array is a group of values all next to each other. Consider the following code:
int int_array[] = int[50];
int_array[0] = 1;
int_array[1] = 2;
...
In this example, what is int_array
? The answer seems obvious. It's an array. But there's more to it than thar. What if we do this?
printf("%d\n", *int_array);
It prints out 1
. Why? Because int_array is actually just a pointer to the first element of the array.
So why am I talking about arrays? Because a string is just an array of characters. When you run char* string = "Hello!"
, you just create an array that looks like this: ['H', 'e', 'l', 'l', 'o', '!', '\0']
. C knows that the string has ended once it reaches the null symbol ('\0'
).
In your first snippet, var is a pointer to the letter 'H', and the print statement keeps printing characters until it reaches null.
What about the second snippet?
%d
doesn't dereference the variable like %s
does. It just prints a number as a signed integer. The integer in this case is the memory address of your integer.
Why cant you assign pointers?
You can. You'll get a warning, and it will probably cause a segmentation fault, but you can try it. I compiled your code example using clang and this is what I got:
test.c:1:1: warning: return type of 'main' is not 'int' [-Wmain-return-type]
void main() {
^
test.c:1:1: note: change return type to 'int'
void main() {
^~~~
int
test.c:2:7: warning: incompatible integer to pointer conversion initializing 'int *' with an expression of type 'int' [-Wint-conversion]
int* var = 5;
^ ~
2 warnings generated.
I will not dare try to run it though. Basically what you just did is try to access the fifth location in memory, which is most likely some OS stuff. You don't have access to that
Why did it work for the string?
Because it doesn't point to a specific location. It points to the location of the string that C made for you. Your code is roughly equivalent to this:
char h = 'H';
char e = 'e';
...
char* var = &h;
What is the difference between char stringA[LEN] and char* stringB[LEN] in C
Are they any different?
Yes.
Both variables stringA
and stringB
are arrays. stringA
is an array of char
of size LEN
and stringB
is an array of char *
of size LEN
.
char
and char *
are two different types. stringA
can hold only one character string of length LEN
while elements of stingB
can point to LEN
number of strings.
Or does
stringB
again becomes immutable as in the case before?
Whether strings pointed by elements of stringB
is mutable or not will depend on how memory is allocated. If they are initialized with string literals
char* stringB[LEN] = { "Apple", "Bapple", "Capple"};
then they are immutable. In case of
for(int i = 0; i < LEN; i++)
stringB[i] = malloc(30) // Allocating 30 bytes for each element
strcpy(stringB[0], "Apple");
strcpy(stringB[1], "Bapple");
strcpy(stringB[2], "Capple");
they are mutable.
What is the difference between char s[] and char *s?
The difference here is that
char *s = "Hello world";
will place "Hello world"
in the read-only parts of the memory, and making s
a pointer to that makes any writing operation on this memory illegal.
While doing:
char s[] = "Hello world";
puts the literal string in read-only memory and copies the string to newly allocated memory on the stack. Thus making
s[0] = 'J';
legal.
Related Topics
Is Floating-Point Addition and Multiplication Associative
Serializing a Class Which Contains a Std::String
Deprecated Header ≪Codecvt≫ Replacement
When Were the 'And' and 'Or' Alternative Tokens Introduced in C++
How to Open an Std::Fstream (Ofstream or Ifstream) With a Unicode Filename
Using Stdlib'S Rand() from Multiple Threads
Why Will Std::Sort Crash If the Comparison Function Is Not as Operator ≪
Why Are References Not Reseatable in C++
How to Find the Index of Current Object in Range-Based For Loop
A Std::Map That Keep Track of the Order of Insertion
How to Combine Hash Values in C++0X
Default Value to a Parameter While Passing by Reference in C++
Why Do C++11-Deleted Functions Participate in Overload Resolution