Using Strtok With a Std::String

c++ tokenize std string

I always use getline for such tasks.

istringstream is(line);
string part;
while (getline(is, part, ','))
cout << part << endl;

C++ split string class in function not using strtok()

Try bellow source-code : (test it online)

#include <vector>
#include <string>
#include <sstream>
#include <iostream>
#include <cstdlib>

std::string a;
std::string b;
int x;
double y;

std::vector<std::string> split(const std::string& s, char delimiter)
{
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream(s);
while (std::getline(tokenStream, token, delimiter))
{
tokens.push_back(token);
}
return tokens;
}

int main()
{
std::string str = "hello,how are you?,3,4";
std::vector<std::string> vec;
vec = split(str, ',');

a = vec[0];
b = vec[1];
x = std::stoi(vec[2]); // support in c++11
x = atoi(vec[2].c_str());
y = std::stod(vec[2].c_str()); // support in c++11
y = atof(vec[2].c_str());

std::cout << a << "," << b << "," << x << "," << y << std::endl;

}

The output will be :

hello,how are you?,3,3

How to use strtok on char*

You are trying to modify a string literal (the function strtok changes the source string inserting null characters '\0')

char* str ="- This, a sample string.";

First of all in C++ opposite to C string literals have types of constant character arrays. So you have to write the declaration of the pointer in a C++ program with the qualifier const.

const char* str ="- This, a sample string.";

Any attempt to change a string literal in C and C++ results in undefined behavior.

For example in the C Standard there is written (6.4.5 String literals)

7 It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

So it is always better also in C to declare pointers to string literals with the qualifier const.

Instead of strtok you could use for example C standard string function strspn and strcspn.

Here is a demonstration program.

#include <iostream>
#include <iomanip>
#include <string_view>
#include <cstring>

int main()
{
const char *s = "- This, a sample string.";
const char *delim = " ., -";

for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
{
auto n = strcspn( p, delim );

std::string_view sv( p, n );

std::cout << std::quoted( sv ) << ' ';

p += n;
}

std::cout << '\n';
}

The program output is

"This" "a" "sample" "string"

You could for example declare a vector of string views like std::vector<std::string_view> and store in it each substring.

For example

#include <iostream>
#include <iomanip>
#include <string_view>
#include <vector>
#include <cstring>

int main()
{
const char *s = "- This, a sample string.";
const char *delim = " ., -";

std::vector<std::string_view> v;

for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
{
auto n = strcspn( p, delim );

v.emplace_back( p, n );

p += n;
}

for (auto sv : v)
{
std::cout << std::quoted( sv ) << ' ';
}
std::cout << '\n';
}

The program output is the same as shown above.

Or if the compiler does not support C++ 17 then instead of a vector of the type std::vector<std::string_view> you can use a vector of the type std::vector<std::pair<const char *, size_t>>.

For example

#include <iostream>
#include <iomanip>
#include <utility>
#include <vector>
#include <cstring>

int main()
{
const char *s = "- This, a sample string.";
const char *delim = " ., -";

std::vector<std::pair<const char *, size_t>> v;

for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
{
auto n = strcspn( p, delim );

v.emplace_back( p, n );

p += n;
}

for (auto p : v)
{
std::cout.write( p.first, p.second ) << ' ';
}
std::cout << '\n';
}

The program output is

This a sample string

Or you could use a vector of objects of the type std::string: std::vector<std::string>.

In C you can use a variable length array or a dynamically allocated array with the element type of a structure type that contains two data members of the type const char * and size_t similarly to the C++ class std::pair. But To define the array you at first need to calculate how many words there are in the string literal using the same for loop.

Here is a C demonstration program.

#include <stdio.h>
#include <string.h>

int main( void )
{
const char *s = "- This, a sample string.";
const char *delim = " ., -";

size_t nmemb = 0;

for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
{
++nmemb;
size_t n = strcspn( p, delim );
p += n;
}

struct SubString
{
const char *pos;
size_t size;
} a[nmemb];

size_t i = 0;

for (const char *p = s; *( p += strspn( p, delim ) ) != '\0'; )
{
size_t n = strcspn( p, delim );

a[i].pos = p;
a[i].size =n;
++i;
p += n;
}

for ( i = 0; i < nmemb; i++ )
{
printf( "%.*s ", ( int )a[i].size, a[i].pos );
}

putchar( '\n' );
}

The program output is

This a sample string

Using strtok to find substring

As Vlad mentioned, you shouldn't mix STL code (std::string) and classic c code (strtok()).

Instead you can use std::string members like find() or find_first_of() to solve your issue:

bool match(const std::string &path, const std::string &word) {
std::size_t pos = 0; // position of the last match

// iterate over all characters in 'word'
for (std::size_t i = 0; i < word.length(); ++i) {
// look for the next character and store the new position
if ((pos = path.find(word[i], pos)) == std::string::npos)
return false; // return false if it couldn't be found
}
return true; // all characters have been found in order
}

Using strtok in C++

I would not use strtok for this job at all1. If you want to use C-like tools, then read the data with fscanf:

// note there here `file_name` needs to be a FILE * instead of an ifstream.
fscanf(file_name, "%f:%f:%f %f", &hours, &minutes, &seconds, &price);

Most people writing C++ would prefer something more typesafe though. One possibility would be to use essentially the same format string to read the data using Boost.format.

Another possibility would be to use stream extractors:

char ignore1, ignore2;
file >> hours >> ignore1 >> minutes >> ignore2 >> seconds >> price;

As to what this does/how it works: each extractor reads one item from the input stream. the extractors for float each read a number. The extractors for char each read one character. In this case, we expect to see: 99:99:99 99, where 9 means "a digit". So, we read a number, a colon, a number, a colon, a number and another number (the extractor skips whitespace automatically). The two colons are read into char variables, and can either be ignored, or you can check that they really are colons to verify that the input data was in the correct format.

Here's a complete, compileable demo of that technique:

#include <iostream>


int main() {
float hours, minutes, seconds, price;
char ignore1, ignore2;

std::cin >> hours >> ignore1 >> minutes >> ignore2 >> seconds >> price;

std::cout << "H:" << hours
<< " M:" << minutes
<< " S:" << seconds
<< " P:" << price << "\n";
return 0;
}

There are certainly a lot more possibilities, but at least those are a few reasonable ones.


  1. To be honest, I'm not sure there's any job for which I'd use strtok, but there are some where I might be at least a little tempted, or wish strtok weren't so badly designed so I could use it. In this case, however, I don't even see much reason to use anything similar to strtok at all.

Replace a loop with strtok by using Standard Library

Here are two examples of splitting a delimited string.

The first uses std::getline with a string stream, specifying a separator character instead of using the default newline character. Only single-character separators may be used with this technique.

The second example uses the <regex> library, which allows separators of arbitrary length and also gives you more control over how a separator is recognized. Note that the dot character must be escaped in the regex specification, because in the regex language, "." acts as a wildcard.

#include <iostream>
#include <sstream>
#include <vector>
#include <regex>

std::vector<std::string> GetlineSplit(std::string const& line) {
static const char sep = '.';
std::istringstream liness{line};
std::vector<std::string> fields;
for(std::string field; std::getline(liness, field, sep); ) {
fields.push_back(field);
}
return fields;
}

std::vector<std::string> RegexSplit(std::string const& line) {
std::regex seps("\\."); // the dot character needs to be escaped in a regex
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
return std::vector<std::string>(rit, std::sregex_token_iterator());
}

int main() {
std::string line = "abc.def.ghi.klm.nop.qrs.tuv.wxyz";

std::cout << "getline split result:\n";
auto fields_getline = GetlineSplit(line);
for(const auto& field : fields_getline) {
std::cout << field << '\n';
}

std::cout << "\nregex split result:\n";
auto fields_regex = RegexSplit(line);
for(const auto& field : fields_regex) {
std::cout << field << '\n';
}
}


Related Topics



Leave a reply



Submit