Simple String Parsing with C++

How do I parse a string in C?

    for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}

You never enter into this loop.

What's the easiest way to parse a string in C?

You can use sscanf() from the C standard lib. Here's an example of how to get the ip and port as strings, assuming the part in front of the address is constant:

#include <stdio.h>

int main(void)
{
const char *input = "XFR 3 NS 207.46.106.118:1863 0 207.46.104.20:1863\r\n";

const char *format = "XFR 3 NS %15[0-9.]:%5[0-9]";
char ip[16] = { 0 }; // ip4 addresses have max len 15
char port[6] = { 0 }; // port numbers are 16bit, ie 5 digits max

if(sscanf(input, format, ip, port) != 2)
puts("parsing failed");
else printf("ip = %s\nport = %s\n", ip, port);

return 0;
}

The important parts of the format strings are the scanset patterns %15[0-9.] and %5[0-9], which will match a string of at most 15 characters composed of digits or dots (ie ip addresses won't be checked for well-formedness) and a string of at most 5 digits respectively (which means invalid port numbers above 2^16 - 1 will slip through).

Parse (split) a string in C++ using string delimiter (standard C++)

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom

Simple string parsing with C++

This is a try using only standard C++.

Most of the time I use a combination of std::istringstream and std::getline (which can work to separate words) to get what I want. And if I can I make my config files look like:

foo=1,2,3,4

which makes it easy.

text file is like this:

foo=1,2,3,4
bar=0

And you parse it like this:

int main()
{
std::ifstream file( "sample.txt" );

std::string line;
while( std::getline( file, line ) )
{
std::istringstream iss( line );

std::string result;
if( std::getline( iss, result , '=') )
{
if( result == "foo" )
{
std::string token;
while( std::getline( iss, token, ',' ) )
{
std::cout << token << std::endl;
}
}
if( result == "bar" )
{
//...
}
}

C parsing a string into an array of strings

... if parsing a variable into one of the strings of the array is possible at declaration.

At compile time, could concatenate as below:

#define STR "1.0.0.1"
char str[] = STR;

char *arr[6] = {
"example0",
"example1",
"example2",
"example3" " " STR, // Forms "example3 1.0.0.1"
"example4"
};

Perhaps OP is interested in something formed during run-time. It uses a variable length array (VLA).

void foobar(const char *str) {
int n = snprintf(NULL, 0, "example3 %s", str);
char a[n]; // VLA.
snprintf(a, sizeof a, "example3 %s", str);

char *arr[6] = {
"example0",
"example1",
"example2",
a,
"example4"
};

printf("<%s>\n", arr[3]);
}

int main(void) {
foobar("1.0.0.1");
}

Output

<example3 1.0.0.>

Alternatively the space for the string could have been done via an allocation.

char *a = malloc(n + 1u);
sprintf(a, "example3 %s", str);
....
free(a);

Parsing words from string in C

There is at least one mistake in your code: your params array of pointers actually point to local variables buffer whose scope ends within parse() - that causes undefined behavior.

I'd be surprised if for this purpose you can't utilize standard function strtok

It is as simple as:

char input[] = "Some stuff  \t to parse &";  // <- you need to change to array because `strtok` change the string in place

char * pch = strtok (input," \t");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " \t");
}

Functional code:

void parse(void);
char input[] = "Some stuff \t to parse &";
char* params[MAX_PARAMS+1] = {NULL};

int main(int argc, char *argv[]) {
parse();
for (int i = 0; i <= MAX_PARAMS && params[i]; i++) {
puts(params[i]);
}
}

void parse(void) {
int i = 0;
char * pch = strtok (input," \t");
params[i++] = pch;
while (pch != NULL)
{
pch = strtok (NULL, " \t");
params[i++] = pch;
}
}

Parsing a String in C, Without Strtok()

Since you have solved the actual string parsing, your last comment, I shall take as the actual requirement.

"... I want to create a list of words with varying length that can be accessed by index ..."

That is certainly not a task to be solved easily if one is "three weeks into C". Data structure that represents that is what main() second argument is:

        // array (of unknown size)
// of pointers to char
char * argv[] ;

This can be written as an pointer to pointer:

        // same data structure as char * []
char ** list_of_words ;

And this is pushing you straight into the deep waters of C. An non trivial C data structure. As a such it might require a bit more than four weeks of C.

But we can be creative. There is "inbuilt in C" one non trivial data structure we might use. A file.

We can write the words into the file. One word one line. And that is our output: list of words, separated by new line character, stored in a file.

We can even imagine and write a function that will read the word from that result "by index". As you (it seems) need.

         // hint: there is a FILE * behind
int words_count = result_size () ;
const char * word = result_get_word(3) ;

Now, I have boldly gone ahead and have written "all" of it, beside that last "crucial" part. After all, I am sure you would like to contribute too.

So the working code (minus the result_size) and result_get_word() ) is alive and kicking here: https://wandbox.org/permlink/uLpAplNl6A3fgVGw

To avoid the "Wrath of Khan" I have also pasted it here:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
/*
task: remove all non alpha chars from a given string, store the result
*/

int process_and_save (FILE *, const char *) ;
int dump_result(FILE *) ;

int main( const int argc, const char * argv[] )
{
const char * filename = "words.txt";
const char * to_parse = "0abra123ka456dabra789" ;

(void)(&argc) ; (void)argv ; // pacify the compiler warnings

printf("\nInput: %s", to_parse ) ;
int retval = process_and_save(fopen(filename, "w"), to_parse ) ;
if ( EXIT_FAILURE != retval )
{
printf("\n\nOutput:\n") ;
retval = dump_result(fopen(filename, "r"));
}
return retval ;
}

int process_and_save (FILE * fp, const char * input )
{
if(!fp) {
perror("File opening failed");
return EXIT_FAILURE;
}
//
char * walker = (char *)(input) ;
while ( walker++ )
{
if ( ! *walker ) break ;
if ( isalpha(*walker) ) {
fprintf( fp, "%c", *walker ) ;
// I am alpha but next one is not
// so write word end, next
if ( ! isalpha(*(walker +1) ) )
fprintf( fp, "\n" ) ;
}
}
fclose(fp);
return EXIT_SUCCESS;
}

int dump_result(FILE* fp )
{
if(!fp) {
perror("\nFile opening failed");
return EXIT_FAILURE;
}

int c; while ((c = fgetc(fp)) != EOF) { putchar(c); }

if (ferror(fp))
puts("\nI/O error when reading");

fclose(fp);
return EXIT_SUCCESS;
}

I think this is functional and does the job of parsing and storing the result. Not in the complex data structure but in the simple file. The rest should be easy. If need help please do let me know.

C String parsing

Change:

    char board[26], Res[26], Ind[26], Cap[26];

to:

   int board;
float Res;
float Ind;
float Cap;

And, change:

printf("%3d %6d %8e %8e", board, Res, Ind, Cap);

to (perhaps):

printf("%3d %6f %8e %8e", board, Res, Ind, Cap);


Related Topics



Leave a reply



Submit