Safer But Easy-To-Use and Flexible C++ Alternative to Sscanf()

Safer but easy-to-use and flexible C++ alternative to sscanf()

I wrote a bit of code that can read in string and character literals. Like normal stream reads, if it gets invalid data it sets the badbit of the stream. This should work for all types of streams, including wide streams. Stick this bit in a new header:

#include <iostream>
#include <string>
#include <array>
#include <cstring>

template<class e, class t, int N>
std::basic_istream<e,t>& operator>>(std::basic_istream<e,t>& in, const e(&sliteral)[N]) {
std::array<e, N-1> buffer; //get buffer
in >> buffer[0]; //skips whitespace
if (N>2)
in.read(&buffer[1], N-2); //read the rest
if (strncmp(&buffer[0], sliteral, N-1)) //if it failed
in.setstate(in.rdstate() | std::ios::failbit); //set the state
return in;
}
template<class e, class t>
std::basic_istream<e,t>& operator>>(std::basic_istream<e,t>& in, const e& cliteral) {
e buffer; //get buffer
in >> buffer; //read data
if (buffer != cliteral) //if it failed
in.setstate(in.rdstate() | std::ios::failbit); //set the state
return in;
}
//redirect mutable char arrays to their normal function
template<class e, class t, int N>
std::basic_istream<e,t>& operator>>(std::basic_istream<e,t>& in, e(&carray)[N]) {
return std::operator>>(in, carray);
}

And it will make input characters very easy:

std::istringstream input;
double val1, val2;
if (input >>'('>>val1>>','>>val2>>')') //less chars than scanf I think
{
// got them!
}

PROOF OF CONCEPT. Now you can cin string and character literals, and if the input is not an exact match, it acts just like any other type that failed to input correctly. Note that this only matches whitespace in string literals that aren't the first character. It's only four functions, all of which are brain-dead simple.

EDIT

Parsing with streams is a bad idea. Use a regex.

How to sscanf in C++?

I believe this is what you are after:
What should I use instead of sscanf?

#include <sstream>

std::ifstream file( fileName );

if ( file ) { //Check open correctly
std::stringstream ss;
ss << file.getline();
int a, b, c;
ss >> a >> b >> c;
}

Should I use fgets or scanf with limited input in c?

On most operating sytems, user input is, by default, line-based. One reason for this is to allow the user to press the backspace key to correct the input, before sending the input to the program.

For line-based user input, it is meaningful and intuitive for a program to read one line of input at a time. This is what the function fgets does (provided that the buffer is large enough to store the entire line of input).

The function scanf, on the other hand, normally does not read one line of input at a time. For example, when you use the %s or %d conversion format specifier with scanf, it will not consume an entire line of input. Instead, it will only consume as much input as matches the conversion format specifier. This means that the newline character at the end of the line will normally not be consumed (which can easily lead to programming bugs). Also, scanf called with the %d conversion format specifier will consider input such as 6sldf23dsfh2 as valid input for the number 6, but any further calls to scanf with the same specifier will fail, unless you discard the remainder of the line from the input stream.

This behavior of scanf is counter-intuitive, whereas the behavior of fgets is intuitive, when dealing with line-based user input.

After using fgets, you can use the function sscanf on the string, for parsing the contents of an individual line. This will allow you to continue using scansets. Or you can parse the line by some other means. Either way, as long as you are using fgets instead of scanf for reading the input, you will be handling one line of input at a time, which is the natural and intuitive way to deal with line-based user input.

When we use fgets what happen if user enter characters more than boundary (I mean a lot of characters), does it lead to buffer overflow? Then how to deal with it?

If the user enters more characters than fit in the buffer as specified by the second fgets function argument, then it will not overflow the buffer. Instead, it will only extract as many characters from the input stream as fit in the buffer. You can determine whether the entire line was read by checking whether the string contains a newline character '\n' at the end.

What makes a C standard library function dangerous, and what is the alternative?

In the old days, most of the string functions had no bounds checking. Of course they couldn't just delete the old functions, or modify their signatures to include an upper bound, that would break compatibility. Now, for almost every one of those functions, there is an alternative "n" version. For example:

strcpy -> strncpy
strlen -> strnlen
strcmp -> strncmp
strcat -> strncat
strdup -> strndup
sprintf -> snprintf
wcscpy -> wcsncpy
wcslen -> wcsnlen

And more.

See also https://github.com/leafsr/gcc-poison which is a project to create a header file that causes gcc to report an error if you use an unsafe function.

How to make that scanf is optionally ignoring one of its conversion specifiers?

scanf("%s %d %d", &value1[0], &value2, &value3)

The problem is scanf() can´t ignore the third conversion specifier. It still keeps trying to catch the decimal input for the third argument, value3.

To catching the whole input as string first and then later split the content of this string into each own object may be a better alternative.

fgets() is a litte more safer than scanf() when taking user input so I´ll use fgets().

char* fgets ( char * str, int num, FILE * stream );

With fgets() you need to define the amount of characters to read (num), which is a great feature for maintaining security, but in this case we don´t know what a user potentially will input as integer for the decimal input requests. If we specify to less characters for the digits, the rest digits will be left in stdin.

A workaround would be to specify the total amount of digits possible for representing integers of type int in decimal notation in the call to fgets().

These would be 10 digits for the total number of "2,147,483,647" on 64-bit architectures or 5 digits for the total number of "32,767" on 32-bit architectures. I go with the 64-bit case for now.

So 10 (value1) + 10 (value2) + 10 (value3) + 2space characters between value1 and value2 and between value2 and value3 + the terminating \0 for a string = 33 characters. Note that fgets() also reads the entered newline \n (but we later discard that from the string easily) so we need even one character more; it is in total 34 characters:

char buffer[34];
fgets(buffer,sizeof(buffer),stdin);

After that we need to proof how many content we have in the string stored in buffer. We can proof it implicitly by counting the space characters:

unsigned int mark;

for(unsigned int i = 0; i < (sizeof(buffer)/sizeof(buffer[0]) - 1)); i++)
{
if(buffer[i] == ' ')
mark++;
}

Thereafter we need to transfer/convert the individual content pieces inside the string in buffer to the appropriate own objects by using sscanf(). If we now have 2 space characters in the string, we can use 3 conversion specifiers in a sscanf command, else we have appropriate sscanf()s:

if(mark == 2)
{
sscanf(buffer,"%s %d %d", value1, &value2, &value3);
}
else if(mark == 1)
{
sscanf(buffer,"%s %d", value1, &value2);
}
else if(mark == 0)
{
sscanf(buffer,"%s", value1);
}
else
{
printf("The input entered is not valid!\n");
printf("Please try again!\n");
}

The whole code is then:

#include <stdio.h>
#include <string.h>

int main()
{
char value1[10];
int value2;
int value3;
unsigned int mark;
char buffer[34];

for(;;)
{
fgets(buffer,sizeof(buffer),stdin);
buffer[strcspn(buffer, "\n")] = 0;

mark = 0;

for(unsigned int i = 0; i < ((sizeof(buffer)/sizeof(buffer[0]) - 1)); i++)
{
if(buffer[i] == ' ')
mark++;
}

if(mark == 2)
{
sscanf(buffer,"%s %d %d", value1, &value2, &value3);
break;
}
else if(mark == 1)
{
sscanf(buffer,"%s %d", value1, &value2);
break;
}
else if(mark == 0)
{
sscanf(buffer,"%s", value1);
break;
}
else
{
printf("The input entered is not valid!\n");
printf("Please try again!\n");
}
}

printf("value1 = %s\n", value1);

if(mark == 1 || mark == 2)
printf("value2 = %d\n", value2);

if(mark == 2)
printf("value3 = %d\n", value3);

return 0;
}


Related Topics



Leave a reply



Submit