Semantics of Flags on Basic_Ios

Semantics of flags on basic_ios

There are three flags that indicate error state:

badbit means something has gone very wrong with the stream. It might be a buffer error or an error in whatever is feeding data to the stream. If this flag is set, it's likely that you aren't going to be using the stream anymore.
failbit means that an extraction or a read from the stream failed (or a write or insertion for output streams) and you need to be aware of that failure.
eofbit means the input stream has reached its end and there is nothing left to read. Note that this is set only after you attempt to read from an input stream that has reached its end (that is, it is set when an error occurs because you try to read data that isn't there).

The failbit may also be set by many operations that reach EOF. For example, if there is only whitespace left remaining in the stream and you try to read an int, you will both reach EOF and you will fail to read the int, so both flags will be set.

The fail() function tests badbit || failbit.

The good() function tests !(badbit || failbit || eofbit). That is, a stream is good when none of the bits are set.

You can reset the flags by using the ios::clear() member function; this allows you to set any of the error flags; by default (with no argument), it clears all three flags.

Streams do not overload operator bool(); operator void*() is used to implement a somewhat broken version of the safe bool idiom. This operator overload returns null if badbit or failbit is set, and non-null otherwise. You can use this to support the idiom of testing the success of an extraction as the condition of a loop or other control flow statement:

if (std::cin >> x) {
    // extraction succeeded
}
else {
    // extraction failed
}

The operator!() overload is the opposite of the operator void*(); it returns true if the badbit or failbit is set and false otherwise. The operator!() overload is not really needed anymore; it dates back to before operator overloads were supported completely and consistently (see sbi's question "Why does std::basic_ios overload the unary logical negation operator?").

C++0x fixes the problem that causes us to have to use the safe bool idiom, so in C++0x the basic_ios base class template does overload operator bool() as an explicit conversion operator; this operator has the same semantics as the current operator void*().

Why does std::basic_ios overload the unary logical negation operator?

With old (read: not long after cfront) C++ compilers, the compiler was not guaranteed to implicitly call typecast operators on objects when needed. If iostream didn't have an operator ! declared, then you couldn't expect !cout to work in all cases. C++89 (or whatever the pre-C++98 standard was called) simply left the area undefined.

This is also why operator void*() was overloaded, and not operator int or operator bool. (bool didn't even exist as its own type in the standard at that point.) I remember my professor telling me that if(), under the hood, expected a void* in C++, because that type could act as a "superset" type relative to those expression result types that would be passed to an if statement, but I have not found this spelled out anywhere.

This was around the time of gcc 2, when most folks didn't support templates or exceptions, or if they did, didn't fully support them, so metaprogramming C++ with templates was still a theoretical exercise and you made sure to check that operator new didn't return a null pointer.

This drove me nuts for several years.

An interesting excerpt from Stroustrup's The C++ Programming Language, 3rd ed. (1997), page 276:

The istream and ostream types rely on a conversion function to enable statements such as
while (cin >> x) cout << x;
The input operation cin>>x returns an istream&. That value is implicitly converted to a value indicating the state of cin. The value can then be tested by while. However, it is typically not a good idea to define an implicit conversion from one type to another in such a way that information is lost in the conversion.

There's a lot in C++ that seems to be a victory of cute or clever over consistent. I wouldn't mind one bit if C++ was smart enough to handle the above loop as:

while (!(cin >> x).fail()) cout << x;

because this, while more verbose and more punctuation, is clearer to a beginning programmer.

... Actually, come to think of it, I don't like either of those constructs. Spell it out:

for(;;)
{   cin >> x;
    if(!cin)
        break;
    cout << x;
}

Why do I like this better? Because this version makes it far clearer how to expandthe code to, say, handle two reads at a time instead of one. For example, "The existing code copies a sequence of float values. We want you to change it so it pairs up the float values and writes them out, two per line, because we're now using complex numbers."

But I digress.

What are the guidelines regarding parsing with iostreams?

Personally, I think these are reasonable questions and I remember very well that I struggled with them myself. So here we go:

Where is my mistake here ?

I wouldn't call it a mistake but you probably want to make sure you don't have to back off from what you have read. That is, I would implement three versions of the input functions. Depending on how complex the decoding of a specific type is I might not even share the code because it might be just a small piece anyway. If it is more than a line or two probably would share the code. That is, in your example I would have an extractor for FooBar which essentially reads the Foo or the Bar members and initializes objects correspondingly. Alternatively, I would read the leading part and then call a shared implementation extracting the common data.

Let's do this exercise because there are a few things which may be a complication. From your description of the format it isn't clear to me if the "string" and what follows the string are delimited e.g. by a whitespace (space, tab, etc.). If not, you can't just read a std::string: the default behavior for them is to read until the next whitespace. There are ways to tweak the stream into considering characters as whitespace (using std::ctype<char>) but I'll just assume that there is space. In this case, the extractor for Foo could look like this (note, all code is entirely untested):

std::istream& read_data(std::istream& is, Foo& foo, std::string& s) {
    Foo tmp(s);
    if (is >> get_char<'('> >> tmp.m_x >> get_char<','> >> tmp.m_y >> get_char<')'>)
        std::swap(tmp, foo);
    return is;
}
std::istream& operator>>(std::istream& is, Foo& foo)
{
    std::string s;
    return read_data(is >> s, foo, s);
}

The idea is that read_data() read the part of a Foo which is different from Bar when reading a FooBar. A similar approach would be used for Bar but I omit this. The more interesting bit is the use of this funny get_char() function template. This is something called a manipulator and is just a function taking a stream reference as argument and returning a stream reference. Since we have different characters we want to read and compare against, I made it a template but you can have one function per character as well. I'm just too lazy to type it out:

template <char Expect>
std::istream& get_char(std::istream& in) {
    char c;
    if (in >> c && c != 'e') {
        in.set_state(std::ios_base::failbit);
    }
    return in;
}

What looks a bit weird about my code is that there are few checks if things worked. That is because the stream would just set std::ios_base::failbit when reading a member failed and I don't really have to bother myself. The only case where there is actually special logic added is in get_char() to deal with expecting a specific character. Similarly there is no skipping of whitespace characters (i.e. use of std::ws) going on: all the input functions are formatted input functions and these skip whitespace by default (you can turn this off by using e.g. in >> std::noskipws) but then lots of things won't work.

With a similar implementation for reading a Bar, reading a FooBar would look something like this:

std::istream& operator>> (std::istream& in, FooBar& foobar) {
    std::string s;
    if (in >> s) {
         switch ((in >> std::ws).peek()) {
         case '(': { Foo foo; read_data(in, foo, s); foobar = foo; break; }
         case '[': { Bar bar; read_data(in, bar, s); foobar = bar; break; }
         default: in.set_state(std::ios_base::failbit);
         }
    }
    return in;
 }

This code uses an unformatted input function, peek() which just looks at the next character. It either return the next character or it returns std::char_traits<char>::eof() if it fails. So, if there is either an opening parenthesis or an opening bracket we have read_data() take over. Otherwise we always fail. Solved the immediate problem. On to distributing information...

Should one write his calls to operator>> to leave the initial data still available after a failure ?

The general answer is: no. If you failed to read something went wrong and you give up. This might mean that you need to work harder to avoid failing, though. If you really need to back off from the position you were at to parse your data, you might want to read data first into a std::string using std::getline() and then analyze this string. Use of std::getline() assumes that there is a distinct character to stop at. The default is newline (hence the name) but you can use other characters as well:

std::getline(in, str, '!');

This would stop at the next exclamation mark and store all characters up to it in str. It would also extract the termination character but it wouldn't store it. This makes it interesting sometimes when you read the last line of a file which may not have a newline: std::getline() succeeds if it can read at least one character. If you need to know if the last character in a file is a newline, you can test if the stream reached:

if (std::getline(in, str) && in.eof()) { std::cout << "file not ending in newline\"; }

If so, how can I do that efficiently ?

Streams are by their very nature single pass: you receive each character just once and if you skip over one you consume it. Thus, you typically want to structure your data in a way such that you don't have to backtrack. That said, this isn't always possible and most streams actually have a buffer under the hood two which characters can be returned. Since streams can be implemented by a user there is no guarantee that characters can be returned. Even for the standard streams there isn't really a guarantee.

If you want to return a character, you have to put back exactly the character you extracted:

char c;
if (in >> c && c != 'a')
    in.putback(c);
if (in >> c && c != 'b')
    in.unget();

The latter function has slightly better performance because it doesn't have to check that the character is indeed the one which was extracted. It also has less chances to fail. Theoretically, you can put back as many characters as you want but most streams won't support more than a few in all cases: if there is a buffer, the standard library takes care of "ungetting" all characters until the start of the buffer is reached. If another character is returned, it calls the virtual function std::streambuf::pbackfail() which may or may not make more buffer space available. In the stream buffers I have implemented it will typically just fail, i.e. I typically don't override this function.

If not, is there a way to "store" (and restore) the complete status of an input stream: state and data ?

If you mean to entirely restore the state you were at, including the characters, the answer is: sure there is. ...but no easy way. For example, you could implement a filtering stream buffer and put back characters as described above to restore the sequence to be read (or support seeking or explicitly setting a mark in the stream). For some streams you can use seeking but not all streams support this. For example, std::cin typically doesn't support seeking.

Restoring the characters is only half the story, though. The other stuff you want to restore are the state flags and any formatting data. In fact, if the stream went into a failed or even bad state you need to clear the state flags before the stream will do most operations (although I think the formatting stuff can be reset anyway):

std::istream fmt(0); // doesn't have a default constructor: create an invalid stream
fmt.copyfmt(in);     // safe the current format settings
// use in
in.copyfmt(fmt);     // restore the original format settings

The function copyfmt() copies all fields associated with the stream which are related to formatting. These are:

the locale
the fmtflags
the information storage iword() and pword()
the stream's events
the exceptions
the streams's state

If you don't know about most of them don't worry: most stuff you probably won't care about. Well, until you need it but by then you have hopefully acquired some documentation and read about it (or ask and got a good response).

What differences are they between failbit and badbit ? When should we use one or the other ?

Finally a short and simple one:

failbit is set when formatting errors are detected, e.g. a number is expected but the character 'T' is found.
badbit is set when something goes wrong in the stream's infrastructure. For example, when the stream buffer isn't set (as in the stream fmt above) the stream has std::badbit set. The other reason is if an exception is thrown (and caught by way of the the exceptions() mask; by default all exceptions are caught).

Is there any online reference (or a book) that explains deeply how to deal with iostreams ? not just the basic stuff: the complete error handling.

Ah, yes, glad you asked. You probably want to get Nicolai Josuttis's "The C++ Standard Library". I know that this book describes all the details because I contributed to writing it. If you really want to know everything about IOStreams and locales you want Angelika Langer & Klaus Kreft's "IOStreams and Locales". In case you wonder where I got the information from originally: this was Steve Teale's "IOStreams" I don't know if this book is still in print and it lacking a lot of the stuff which was introduced during standardization. Since I implemented my own version of IOStreams (and locales) I know about the extensions as well, though.

reading a line in text file twice

Try

while( myfile >> name >> phone ) {
    // your code here
}

I believe the problem with the other approach is that eof isn't signaled until you actually try to read more than you should. That is, when you attempt to myfile >> name on the last round. That fails, as does myfile >> phone and only then do you break out of the loop.

Why doesn't the EOF character work if put at the end of a line?

The C and C++ standards allow text streams to do quite Unholy things in text mode, which is the default. These Unholy Things include translation between internal newline markers and external newline control characters, as well as treating certain characters or character sequences as denoting end of file. In Unix-land it's not done, but in Windows-land it's done, so the the code can relate only to the original Unix-land conventions.

This means that in Windows, there is no way to write a portable C or C++ program that will copy its input exactly to its input.

While in Unix-land, that's no problem at all.

In Windows, a line consisting of a single [Ctrl Z] is by convention an End Of File marker. This is so not only in the console, but also in text files (depending a bit on the tools). Windows inherited this from DOS, which in turn inherited the general idea from CP/M.

I'm not sure where CP/M got it from, but it's only similar, not at all the same!, as Unix' [Ctrl D].

Over in Unix-land the general convention for end of file is just "no more data". In the console a [Ctrl D] will by default send your typed text immediately to the waiting program. When you haven't typed anything on the line yet, 0 bytes are sent, and a read that returns 0 bytes has by convention encountered end-of-file.

The main difference is that internally in Windows the text end of file marker is data, that can occur within a file, while internally in Unix it's lack of data, which can't occur within a file. Of course Windows also supports ordinary end of file (no more data!) for text. Which complicates things – Windows is just more complicated.

#include <iostream>
using namespace std;

int main()
{
    char ch;
    while(cin >> ch) {
        cout << 0+ch << " '" << ch << "'" << endl;
    }
}

Cin not waiting for input despite cin.ignore()

The problem is for the user to exit the loop you need to put the cin in a failed state. That is why your

while(cin >> val){ .... }

is working.

If in a failed state cin is no longer in a position to supply you with input so you need to clear() the failed state. You also need to ignore() the previously non-integer response that triggered the failed state initially.

It would also be of merit to use

if(cin >> n){
    cout << "You entered " << n;
}

This will assert that a proper input for n was provided.

Wrong Inputs will cause the program to exit

Answer https://stackoverflow.com/a/10379322/924727 expalins what happens.
About the why, we must go a bit into philosophy.

The C++ stream model is not thought for "human interaction": it is a generic converter of a virtually infinite sequence of characters into a list of space separated "values" to be converted into the supplied "variables".

There is no concept of "input and output interleaving to for a dialog".
If you write your input into a text file like myinput.txt (unsing correct input)

ABC456 9 7.8 XYZ
Y
ABC456 5 6.7 XYZ
N

and ron your program from the command prompt like

   myprogram < myinput.txt

your program will run ... and no "pause" can be required to see the output, since no-one is sitting there to see it and answer it.

The program pauses to wait user input not because of cin >>, but because cin is not in fail state and the buffer is empty and the source the buffer remaps is the console. It is the console that waits for '\n' before returning, not cin.

When cin >> n is called...

the operator>> function is called, and it ...
gets the num_get facet from the stream locale and call its get function that...
call the stream buffer sbumpc repeatedly to get the digits and compute the number value.
If the buffer has content, it just return its characters one after the other. When no more character are present (or if it is empty) ...
The buffer ask the operating system to read from the low level file.
If the file is the console, the console internal line editor is invoked:
This makes the console to stuck letting the user press character and some controls (like the backspace, for example) until, when Enter is pressed
The console line editor returns the line to the operating system that will let the contents available to the input CON file ...
That is read by the buffer (after passing the read characters to the cvt locale facet, but this is a detail) that fills up itself.
Now do yourself the returns an unrolling.

All this mechanism makes the fact that - if you type more than required - the buffer content remains available to the next >> calls, independently it is or not another program line.

A proper "safer" parse requires, afer an input has been read, the stream state to be cleared and the following content to be ignored up to the next '\n'.
This is typically done with

cin.clear(); 
cin.ignore(numeric_limits<std::streamsize>::max(), '\n');

So that, whatever had been typed is discarded, and the next cin>> finds a buffer with no data (just the '\n', that is trimmed as "beginning space"), thus causing the console to go in line edit mode again.

Semantics of Flags on Basic_Ios