Parse (Split) a String in C++ Using String Delimiter (Standard C++)

Parse (split) a string in C++ using string delimiter (standard C++)

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom

Split string with delimiters in C

You can use the strtok() function to split a string (and specify the delimiter to use). Note that strtok() will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy to strtok().

EDIT:

Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

char** str_split(char* a_str, const char a_delim)
{
char** result = 0;
size_t count = 0;
char* tmp = a_str;
char* last_comma = 0;
char delim[2];
delim[0] = a_delim;
delim[1] = 0;

/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}

/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);

/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;

result = malloc(sizeof(char*) * count);

if (result)
{
size_t idx = 0;
char* token = strtok(a_str, delim);

while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}

return result;
}

int main()
{
char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char** tokens;

printf("months=[%s]\n\n", months);

tokens = str_split(months, ',');

if (tokens)
{
int i;
for (i = 0; *(tokens + i); i++)
{
printf("month=[%s]\n", *(tokens + i));
free(*(tokens + i));
}
printf("\n");
free(tokens);
}

return 0;
}

Output:

$ ./main.exe
months=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]

month=[JAN]
month=[FEB]
month=[MAR]
month=[APR]
month=[MAY]
month=[JUN]
month=[JUL]
month=[AUG]
month=[SEP]
month=[OCT]
month=[NOV]
month=[DEC]

Is there a function to split by the FIRST instance of a delimiter in C?

The first time you call strtok, use the delimiter you want to split with.

For the second call, use an empty delimiter string (if you really want the rest of the string) or use "\n", in the case that your string might include a newline character and you don't want that in the split (or even "\r\n"):

    const char* first = strtok(buf, ":");
const char* rest = strtok(NULL, "");
/* or: const char* rest = strtok(NULL, "\n"); */

How to split strings in C++ like in python?

If your tokenizer is always a white space (" ") and you might not tokenize the string with other characters (e.g. s.split(',')), you can use string stream:

#include <iostream>
#include <string>
#include <stringstream>

int main() {
std::string my_string = " Hello world! ";
std::string str1, str2;
std::stringstream s(my_string);

s>>str1>>str2;

std::cout<<str1<<std::endl;
std::cout<<str2<<std::endl;
return 0;
}

Keep in mind that this code is only suggested for whitespace tokens and might not be scalable if you have many tokens.
Output:

Hello
World!

Splitting a string by a delimiter in C

Splitting Unix paths is more than just splitting on /. These all refer to the same path...

  • /foo/bar/baz/
  • /foo/bar/baz
  • /foo//bar/baz

As with many complex tasks, it's best not to do it yourself, but to use existing functions. In this case there are the POSIX dirname and basename functions.

  • dirname returns the parent path in a filepath
  • basename returns the last portion of a filepath

Using these together, you can split Unix paths.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libgen.h>

int main(void) {
char filepath[] = "/foo/bar//baz/";

char *fp = filepath;
while( strcmp(fp, "/") != 0 && strcmp(fp, ".") != 0 ) {
char *base = basename(fp);
puts(base);

fp = dirname(fp);
}

// Differentiate between /foo/bar and foo/bar
if( strcmp(fp, "/") == 0 ) {
puts(fp);
}
}

// baz
// bar
// foo
// /

It's not the most efficient, it does multiple passes through the string, but it is correct.

Splitting C++ strings using a Delimiter

Let's go step by step, starting with the date:

29/01/2022  -- Day, Month, Year.  

Given the following:

unsigned int day = 0u;
std::cin >> day;

The input of an integer skips whitespace until the first number character (for the first number character, also includes '+' and '-'). The extraction operator keeps reading characters, building a number, until a non-numeric character is reached:

2 --> day.
9 --> day.

The next character is '/', which is not a numeric character so the extraction operator returns the number 29.

The character '/' in this context is known as a delimiter, because it separates the day field from the month field.

Since it's a character, it has to be read using a character variable:

char delimiter = '\0';  
std::cin >> delimiter;

Now, the delimiter is no longer in the buffer.
You can check the content of the delimiter variable or move on.

Reading the month is similar:

unsigned int month = 0U;
std::cin >> month;

Edit 1: delimiter and substrings

You could extract the month as a string using a delimiter:

std::string month_as_text;
std::getline(std::cin, month_as_text, '/');

The getline function above reads characters from std::cin, placing into the string month_as_text, until it finds the delimiter character '/'. You can then convert month_as_text into an integer variable.



Related Topics



Leave a reply



Submit