Istream Behavior Change in C++ Upon Failure

C++ istream tellg()/fail() on eof: behavior change; work-around?

The behavior you seem to expect is probably wrong. Both C++11
and C++03 start the description of tellg with "Behaves as an
unformatted input function[...]". An "unformatted input
function" starts by constructing a sentry object, and will
fail, doing nothing and returning a failure status, if the
sentry object converts to false. And the sentry object
will convert to false if the eofbit is set.

The standard is slightly less clear about whether reading the
number sets the eofbit, but only slightly (with the
information spread out over several different sections).
Basically, when inputting a numeric value, the stream (actually,
the num_get facet) must read one character ahead, in order to
know where the number ends. In your case, it will see the end
of file when this occurs, and will thus set eofbit. So your
first assert will fail with a conforming implementation.

One could very easily consider this a defect in the standard, or
unintentional. It's very easy to imagine some implementations
doing the sensible thing (which is what you seem to expect),
perhaps because the original implementors didn't realize the
full implications in the standard (or unconsciously read it as
they thought it should read). I would guess that this is the
case for g++, and when they realized that their behavior was
non-conformant, they fixed it.

As for work-arounds... I'm not sure what the real problem is,
that you're trying to work around. But I think that if you
clear the error bits before the tellg, it should work. (Of
course, then iScan.good() will be true, and iScan.eof()
false. But can this really matter?) Just be sure to check
that the extraction actually succeeded before you clear the
status.

Value stored when istream read fails

According to 27.7.2.2.1 [istream.formatted.reqmts] paragraph 1 the first things for the formatted input function is to construct an std::istream::senty object. Further processing depends on whether this object converts to true or false: nothing happens to the value if the sentry converts to false.

According to 27.7.2.1.3 [istream::sentry] paragraphs 5 and 7 the sentry will convert to false if the stream's flags are not std::ios_base::goodbit. That is, if either a failure happend or EOF is reached the sentry will convert to false. As a result, the value stays at 5 when EOF is reached after skipping whitespace, assuming std::ios_base::skipws is set. Unsetting std::ios_base::skipws should result in the value becoming 0 if there is, at least, one space.

Once parsing is actually done, the applies logic is defined in 22.4.2.1.2 [facet.num.get.virtuals] paragraph 3, Stage 3. The key section on the affected value is

...
The numeric value to be stored can be one of:
— zero, if the conversion function fails to convert the entire field. ios_base::failbit is assigned to err.
— the most positive representable value, if the field represents a value too large positive to be represented in val. ios_base::failbit is assigned to err.
— the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err.
— the converted value, otherwise.
The resultant numeric value is stored in val.

So, the observed behavior is correct.

With pre-C++11 the value was left unchanged in all cases. It was considered desirable to tell errors apart and indicate with the value which value should be represented. The discussions on how to change the behavior went on for a rather long time and were actually quite contentious.

That the value isn't changed if EOF is reached before attempting a conversion may be considered an error. I don't recall that case to be considered while the change was discussed.

Why does stringstream change value of target on failure?

From this reference:

If extraction fails (e.g. if a letter was entered where a digit is expected), value is left unmodified and failbit is set (until C++11)

If extraction fails, zero is written to value and failbit is set. If extraction results in the value too large or too small to fit in value, std::numeric_limits::max() or std::numeric_limits::min() is written and failbit flag is set. (since C++11)

It seems that your compiler is compiling in C++11 mode, which changes the behavior.

The input operator uses the locale facet std::num_get whose get function invokes do_get. For C++11 it's specified to use std::strtoll et. al. type of functions. Before C++11 it apparently used std::scanf style parsing (going by the reference, I don't have access to the C++03 specification) to extract the numbers. The change in behavior is due to this change in parsing the input.

GCC 4.7 istream::tellg() returns -1 after reaching EOF

According to C++11 section 27.7.2.3p40,

if fail() != false, returns pos_type(-1)

So gcc 4.7 has the correct behavior for the current version of C++ (assuming that peek() at end of stream causes failbit to be set, and it does during sentry construction, since skipws is set by default).

Looking at the wording of C++03, it is the same. 27.6.1.3p37. So the behavior you describe in gcc 4.4 is a bug.

Shouldn't istream::peek() always return what you just putback()?

There has actually been a change to the putback function in C++11:

§27.7.2.3/34

basic_istream<charT,traits>& putback(char_type c);
Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1), except that the function first clears eofbit. ...

Where the second half of the sentence didn't exist in C++03.

So it might depend on whether the compilers have fully implemented this change, or if you use the required options (-std=C++11?).

std::istream::unget() setting fail and bad bits if first read failed but not if second or further reads failed

Your mistake seems to be thinking that when reading an integer and a non-digit is found that character is consumed. It isn't. Just remove the call to std::unget.

Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?

The eof() flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof() is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.

The semantics of eof() is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof() to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof().

If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:

if (stream.peek() != std::char_traits<char>::eof()) {
    do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
    do_something_else();
}

Are stringstream read failures non-deterministic?

MSVC bug. [facet.num.get.virtuals]/6:

Effects: If (str.flags()&ios_base::boolalpha) == 0 then input
proceeds as it would for a long except that if a value is being stored
into val, the value is determined according to the following: If the
value to be stored is 0 then false is stored. If the value is 1 then
true is stored. Otherwise true is stored and ios_base::failbit is
assigned to err.

How a stream error indicator affects following input code?

Assuming no UB, are 5 of the 8 possible and not the 3 unexpected ones?
especially is valid input possible with error indicator set?

Speaking specifically to the provisions of the standard, I'm inclined to agree with your analysis:

Few functions are specified to clear the error indicator of a stream, and fgetc() is not one of them. More generally, none of them are data-transfer functions. Therefore, if the error indicator is set for a stream before that stream is presented to fgetc() for reading, then it should still be set when that function returns, all other considerations notwithstanding. That covers these cases:^*
```
1  0   0   Unexpected
1  1   0   Unexpected
1  1   1   Input error or end-of-file
```
It also covers this case with respect to the expected value of the error indicator, though it does not speak to whether it can actually happen:
```
1  0   1   Normal reading of valid data with error indicator set!
```
fgetc() is specified to return EOF in every situation in which it is specified to set the end-of-file indicator on a stream. Therefore, if fgetc() returns anything other than EOF then it will not, on that call, have set the stream's error (or end-of-file) indicator. That covers these cases:
```
0  0   0   Normal reading of valid data    
0  0   1   Unexpected
```
On the other hand, if fgetc() does return EOF then either the stream's end-of-file indicator or its error indicator should afterward be found set. But the standard distinguishes between these cases, and specifies that the user can distinguish them via the feof() and ferror() functions. That covers these cases:^*
```
0  1   0   End-of-file
0  1   1   Input error
```
Finally, I concur that none of the behavior of fgetc() is conditioned on the initial state of the stream's error indicator. Provided only that the stream is not initially positioned at its end, and its end-of-file indicator is not initially set, "the fgetc function returns the next character from the input stream pointed to by stream." That establishes that this, the case of most interest, is in fact allowed:
```
1  0   1   Normal reading of valid data with error indicator set!
```
However, that the case is allowed in the abstract does not imply that it can be observed in practice. The details seem unspecified, and I would expect them to depend on the implementation of the driver serving the stream in question. It is entirely possible that having once encountered an error, the driver will continue to report an error on subsequent reads until reset appropriately, and perhaps longer. From the C perspective, that would be interpreted as an (additional) error occurring on each subsequent read, and nothing in the language specifications prevents that. Not even use of one of the functions that clear a stream's error indicator.

If codes does not clear the error indicator before hand and wants to
detect if a line of input had a rare input error, it seems to make
sense to test !feof() and not ferror() to detect.

Is checking ferror() potentially misleading? or have I missed something about the error indicator?

I agree that if a stream's error indicator is initially set, its end-of-file indicator is not, and reading it with fgetc() returns EOF, then ferror() does not usefully distinguish between the end-of-file and error cases whereas feof() should.

On the other hand, whether one can usefully continue to read a given stream after an error has been encountered on it depends on implementation and possibly on specific circumstances. That applies even if the error indicator is cleared via a clearerr() call, not to mention if the error indicator is not cleared.

^* Although I agree that there is an ambiguity with respect to EOF in the event that UCHAR_MAX > INT_MAX, I assert that that is just one of several reasons why such an implementation would be problematic. As a practical matter, therefore, I disregard such implementations as entirely hypothetical.