Parse (split) a string in C++ using string delimiter (standard C++)
You can use the std::string::find()
function to find the position of your string delimiter, then use std::string::substr()
to get a token.
Example:
std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
The
find(const string& str, size_t pos = 0)
function returns the position of the first occurrence ofstr
in the string, ornpos
if the string is not found.The
substr(size_t pos = 0, size_t n = npos)
function returns a substring of the object, starting at positionpos
and of lengthnpos
.
If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());
):
s.erase(0, s.find(delimiter) + delimiter.length());
This way you can easily loop to get each token.
Complete Example
std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";
size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;
Output:
scott
tiger
mushroom
Split string with delimiters in C
You can use the strtok()
function to split a string (and specify the delimiter to use). Note that strtok()
will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy to strtok()
.
EDIT:
Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
char** str_split(char* a_str, const char a_delim)
{
char** result = 0;
size_t count = 0;
char* tmp = a_str;
char* last_comma = 0;
char delim[2];
delim[0] = a_delim;
delim[1] = 0;
/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}
/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);
/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;
result = malloc(sizeof(char*) * count);
if (result)
{
size_t idx = 0;
char* token = strtok(a_str, delim);
while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}
return result;
}
int main()
{
char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char** tokens;
printf("months=[%s]\n\n", months);
tokens = str_split(months, ',');
if (tokens)
{
int i;
for (i = 0; *(tokens + i); i++)
{
printf("month=[%s]\n", *(tokens + i));
free(*(tokens + i));
}
printf("\n");
free(tokens);
}
return 0;
}
Output:
$ ./main.exe
months=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]
month=[JAN]
month=[FEB]
month=[MAR]
month=[APR]
month=[MAY]
month=[JUN]
month=[JUL]
month=[AUG]
month=[SEP]
month=[OCT]
month=[NOV]
month=[DEC]
Is there a function to split by the FIRST instance of a delimiter in C?
The first time you call strtok
, use the delimiter you want to split with.
For the second call, use an empty delimiter string (if you really want the rest of the string) or use "\n"
, in the case that your string might include a newline character and you don't want that in the split (or even "\r\n"
):
const char* first = strtok(buf, ":");
const char* rest = strtok(NULL, "");
/* or: const char* rest = strtok(NULL, "\n"); */
How to split strings in C++ like in python?
If your tokenizer is always a white space (" "
) and you might not tokenize the string with other characters (e.g. s.split(',')
), you can use string stream:
#include <iostream>
#include <string>
#include <stringstream>
int main() {
std::string my_string = " Hello world! ";
std::string str1, str2;
std::stringstream s(my_string);
s>>str1>>str2;
std::cout<<str1<<std::endl;
std::cout<<str2<<std::endl;
return 0;
}
Keep in mind that this code is only suggested for whitespace tokens and might not be scalable if you have many tokens.
Output:
Hello
World!
Splitting a string by a delimiter in C
Splitting Unix paths is more than just splitting on /
. These all refer to the same path...
/foo/bar/baz/
/foo/bar/baz
/foo//bar/baz
As with many complex tasks, it's best not to do it yourself, but to use existing functions. In this case there are the POSIX dirname
and basename
functions.
dirname
returns the parent path in a filepathbasename
returns the last portion of a filepath
Using these together, you can split Unix paths.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libgen.h>
int main(void) {
char filepath[] = "/foo/bar//baz/";
char *fp = filepath;
while( strcmp(fp, "/") != 0 && strcmp(fp, ".") != 0 ) {
char *base = basename(fp);
puts(base);
fp = dirname(fp);
}
// Differentiate between /foo/bar and foo/bar
if( strcmp(fp, "/") == 0 ) {
puts(fp);
}
}
// baz
// bar
// foo
// /
It's not the most efficient, it does multiple passes through the string, but it is correct.
Splitting C++ strings using a Delimiter
Let's go step by step, starting with the date:
29/01/2022 -- Day, Month, Year.
Given the following:
unsigned int day = 0u;
std::cin >> day;
The input of an integer skips whitespace until the first number character (for the first number character, also includes '+' and '-'). The extraction operator keeps reading characters, building a number, until a non-numeric character is reached:
2 --> day.
9 --> day.
The next character is '/', which is not a numeric character so the extraction operator returns the number 29
.
The character '/' in this context is known as a delimiter, because it separates the day field from the month field.
Since it's a character, it has to be read using a character variable:
char delimiter = '\0';
std::cin >> delimiter;
Now, the delimiter is no longer in the buffer.
You can check the content of the delimiter variable or move on.
Reading the month is similar:
unsigned int month = 0U;
std::cin >> month;
Edit 1: delimiter and substrings
You could extract the month as a string using a delimiter:
std::string month_as_text;
std::getline(std::cin, month_as_text, '/');
The getline
function above reads characters from std::cin
, placing into the string month_as_text
, until it finds the delimiter character '/'
. You can then convert month_as_text
into an integer variable.
Related Topics
What Does Int Argc, Char *Argv[] Mean
Why Is Integer Assignment on a Naturally Aligned Variable Atomic on X86
Cin and Getline Skipping Input
How to Call a Constructor from Another Constructor (Do Constructor Chaining) in C++
What Is Std::Move(), and When Should It Be Used
Replace Part of a String With Another String
Difference Between Float and Double
How to Add a Linker or Compile Flag in a Cmake File
Why Should C++ Programmers Minimize Use of 'New'
Why Should I Prefer to Use Member Initialization Lists
Easiest Way to Convert Int to String in C++
How to Read an Entire File into a Std::String in C++
How to Implement Classic Sorting Algorithms in Modern C++
How to Convert a Std::String to Int