Splitting a String into an Array in C++ Without Using Vector

splitting a string into an array in C++ without using vector

It is possible to turn the string into a stream by using the std::stringstream class (its constructor takes a string as parameter). Once it's built, you can use the >> operator on it (like on regular file based streams), which will extract, or tokenize word from it:

#include <iostream>
#include <sstream>

using namespace std;

int main(){
string line = "test one two three.";
string arr[4];
int i = 0;
stringstream ssin(line);
while (ssin.good() && i < 4){
ssin >> arr[i];
++i;
}
for(i = 0; i < 4; i++){
cout << arr[i] << endl;
}
}

Splitting string into words in array without using any pre-made functions in C

You have quite a few errors in your program:

  1. arr = (char **)malloc(size * sizeof(char)); is not right since
    arr is of type char**. You should use sizeof(char*) or better
    (sizeof(*arr)) since sizeof(char) is usually not equal to
    sizeof(char*) for modern systems.

  2. You don't have braces {} around your else statement in
    ft_split_whitespaces which you probably intended. So your
    conditional logic breaks.

  3. You are allocating a new char[] for every non--whitespace
    character in the while loop. You should only allocate one for
    every new word and then just fill in the characters in that array.

  4. *(arr+index2) = &str[index];This doesn't do what you think it
    does. It just points the string at *(arr+index2) to str offset
    by index. You either need to copy each character individually or
    do a memcpy() (which you probably can't use in the question). This
    explains why your answer just provides offsets into the main string and
    not the actual tokens.

  5. **arr = '\0'; You will lose whatever you store in the 0th index
    of arr. You need to individually append a \0 to each string in
    arr.

  6. *(arr+index2) = (char*) malloc(index * sizeof(char)); You will end up
    allocating progressively increasing size of char arrays at because
    you are using index for the count of characters, which keeps on
    increasing. You need to figure out the correct length of each token in
    the string and allocate appropriately.

Also why *(arr + index2)? Why not use the much easier to read arr[index2]?


Further clarifications:

Consider str = "abc de"

You'll start with

*(arr + 0) = (char*) malloc(0 * sizeof(char));
//ptr from malloc(0) shouldn't be dereferenced and is mostly pointless (no pun), probably NULL
*(arr + 0) = &str[0];

Here str[0] = 'a' and is a location somehwhere in memory, so on doing &str[0], you'll store that address in *(arr + 0)

Now in the next iteration, you'll have

*(arr + 0) = (char*) malloc(1 * sizeof(char)); 
*(arr + 0) = &str[1];

This time you replace the earlier malloc'd array at the same index2 again with a different address. In the next iterations *(arr + 0) = (char*) malloc(2 * sizeof(char));. You end up resetting the same *(arr + index2) position till you encounter a whitespace after which you do the same thing again for the next word. So don't allocate arrays for every index value but only if and when required. Also, this shows that you'll keep on increasing the size passed to malloc with the increasing value of index which is what #6 indicated.

Coming to &str[index].

You are setting (arr + index2) i.e. a char* (pointer to char) to another char*. In C, setting a pointer to another pointer doesn't copy the contents of the second pointer to the first, but only makes both of them point to the same memory location. So when you set something like *(arr + 1) = &str[4], it's just a pointer into the original string at index = 4. If you try to print this *(arr + 1) you'll just get a substring from index = 4 to the end of the string, not the word you're trying to obtain.

**arr = '\0' is just dereferencing the pointer at *arr and setting its value to \0. So imagine if you had *(arr + 0) = "hello\0", you'll set it to "\0ello\0". If you're ever iterating over this string, you'll never end up traversing beyond the first '\0' character. Hence you lose whatever *arr was earlier pointing to.

Also, *(arr + i) and arr[i] are exactly equivalent and make for much better readability. It better conveys that arr is an array and arr[i] is dereferencing the ith element.

Split string by a character?

stringstream can do all these.

  1. Split a string and store into int array:

    string str = "102:330:3133:76531:451:000:12:44412";
    std::replace(str.begin(), str.end(), ':', ' '); // replace ':' by ' '

    vector<int> array;
    stringstream ss(str);
    int temp;
    while (ss >> temp)
    array.push_back(temp); // done! now array={102,330,3133,76531,451,000,12,44412}
  2. Remove unneeded characters from the string before it's processed such as $ and #: just as the way handling : in the above.

PS: The above solution works only for strings that don't contain spaces. To handle strings with spaces, please refer to here based on std::string::find() and std::string::substr().

C - Split string into an array of strings at certain characters

use strtok()?

string str as apples,cakes,cupcakes,bannanas and delim ",".

char *token;
token = strtok(str, delim);
while(token != NULL)
{
printf("%s\n", token);
token = strtok(NULL,delim);
}

may this help.

Parse (split) a string in C++ using string delimiter (standard C++)

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom

How can I split a string by a delimiter into an array?

Here's my first attempt at this using vectors and strings:

vector<string> explode(const string& str, const char& ch) {
string next;
vector<string> result;

// For each character in the string
for (string::const_iterator it = str.begin(); it != str.end(); it++) {
// If we've hit the terminal character
if (*it == ch) {
// If we have some characters accumulated
if (!next.empty()) {
// Add them to the result vector
result.push_back(next);
next.clear();
}
} else {
// Accumulate the next character into the sequence
next += *it;
}
}
if (!next.empty())
result.push_back(next);
return result;
}

Hopefully this gives you some sort of idea of how to go about this. On your example string it returns the correct results with this test code:

int main (int, char const **) {
std::string blah = "___this_ is__ th_e str__ing we__ will use__";
std::vector<std::string> result = explode(blah, '_');

for (size_t i = 0; i < result.size(); i++) {
cout << "\"" << result[i] << "\"" << endl;
}
return 0;
}


Related Topics



Leave a reply



Submit