C++ istream tellg()/fail() on eof: behavior change; work-around?
The behavior you seem to expect is probably wrong. Both C++11
and C++03 start the description of tellg
with "Behaves as an
unformatted input function[...]". An "unformatted input
function" starts by constructing a sentry
object, and will
fail, doing nothing and returning a failure status, if thesentry
object converts to false
. And the sentry
object
will convert to false
if the eofbit
is set.
The standard is slightly less clear about whether reading the
number sets the eofbit
, but only slightly (with the
information spread out over several different sections).
Basically, when inputting a numeric value, the stream (actually,
the num_get
facet) must read one character ahead, in order to
know where the number ends. In your case, it will see the end
of file when this occurs, and will thus set eofbit
. So your
first assert
will fail with a conforming implementation.
One could very easily consider this a defect in the standard, or
unintentional. It's very easy to imagine some implementations
doing the sensible thing (which is what you seem to expect),
perhaps because the original implementors didn't realize the
full implications in the standard (or unconsciously read it as
they thought it should read). I would guess that this is the
case for g++, and when they realized that their behavior was
non-conformant, they fixed it.
As for work-arounds... I'm not sure what the real problem is,
that you're trying to work around. But I think that if you
clear the error bits before the tellg
, it should work. (Of
course, then iScan.good()
will be true
, and iScan.eof()
false
. But can this really matter?) Just be sure to check
that the extraction actually succeeded before you clear the
status.
Value stored when istream read fails
According to 27.7.2.2.1 [istream.formatted.reqmts] paragraph 1 the first things for the formatted input function is to construct an std::istream::senty
object. Further processing depends on whether this object converts to true
or false
: nothing happens to the value if the sentry
converts to false
.
According to 27.7.2.1.3 [istream::sentry] paragraphs 5 and 7 the sentry
will convert to false
if the stream's flags are not std::ios_base::goodbit
. That is, if either a failure happend or EOF is reached the sentry
will convert to false
. As a result, the value
stays at 5
when EOF is reached after skipping whitespace, assuming std::ios_base::skipws
is set. Unsetting std::ios_base::skipws
should result in the value becoming 0
if there is, at least, one space.
Once parsing is actually done, the applies logic is defined in 22.4.2.1.2 [facet.num.get.virtuals] paragraph 3, Stage 3. The key section on the affected value is
...
The numeric value to be stored can be one of:
— zero, if the conversion function fails to convert the entire field.
ios_base::failbit
is assigned toerr
.— the most positive representable value, if the field represents a value too large positive to be represented in
val
.ios_base::failbit
is assigned toerr
.— the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in
val
.ios_base::failbit
is assigned toerr
.— the converted value, otherwise.
The resultant numeric value is stored in
val
.
So, the observed behavior is correct.
With pre-C++11 the value was left unchanged in all cases. It was considered desirable to tell errors apart and indicate with the value which value should be represented. The discussions on how to change the behavior went on for a rather long time and were actually quite contentious.
That the value isn't changed if EOF is reached before attempting a conversion may be considered an error. I don't recall that case to be considered while the change was discussed.
Why does stringstream change value of target on failure?
From this reference:
If extraction fails (e.g. if a letter was entered where a digit is expected), value is left unmodified and failbit is set (until C++11)
If extraction fails, zero is written to value and failbit is set. If extraction results in the value too large or too small to fit in value, std::numeric_limits::max() or std::numeric_limits::min() is written and failbit flag is set. (since C++11)
It seems that your compiler is compiling in C++11 mode, which changes the behavior.
The input operator uses the locale facet std::num_get
whose get
function invokes do_get
. For C++11 it's specified to use std::strtoll
et. al. type of functions. Before C++11 it apparently used std::scanf
style parsing (going by the reference, I don't have access to the C++03 specification) to extract the numbers. The change in behavior is due to this change in parsing the input.
GCC 4.7 istream::tellg() returns -1 after reaching EOF
According to C++11 section 27.7.2.3p40,
if
fail() != false
, returnspos_type(-1)
So gcc 4.7 has the correct behavior for the current version of C++ (assuming that peek()
at end of stream causes failbit
to be set, and it does during sentry construction, since skipws
is set by default).
Looking at the wording of C++03, it is the same. 27.6.1.3p37. So the behavior you describe in gcc 4.4 is a bug.
Shouldn't istream::peek() always return what you just putback()?
There has actually been a change to the putback
function in C++11:
§27.7.2.3/34
basic_istream<charT,traits>& putback(char_type c);
Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1), except that the function first clears
eofbit
. ...
Where the second half of the sentence didn't exist in C++03.
So it might depend on whether the compilers have fully implemented this change, or if you use the required options (-std=C++11
?).
std::istream::unget() setting fail and bad bits if first read failed but not if second or further reads failed
Your mistake seems to be thinking that when reading an integer and a non-digit is found that character is consumed. It isn't. Just remove the call to std::unget
.
Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?
The eof()
flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof()
is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.
The semantics of eof()
is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof()
to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof()
.
If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:
if (stream.peek() != std::char_traits<char>::eof()) {
do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
do_something_else();
}
Are stringstream read failures non-deterministic?
MSVC bug. [facet.num.get.virtuals]/6:
Effects: If
(str.flags()&ios_base::boolalpha) == 0
then input
proceeds as it would for along
except that if a value is being stored
intoval
, the value is determined according to the following: If the
value to be stored is0
thenfalse
is stored. If the value is1
then
true
is stored. Otherwisetrue
is stored andios_base::failbit
is
assigned toerr
.
How a stream error indicator affects following input code?
Assuming no UB, are 5 of the 8 possible and not the 3 unexpected ones?
especially is valid input possible with error indicator set?
Speaking specifically to the provisions of the standard, I'm inclined to agree with your analysis:
Few functions are specified to clear the error indicator of a stream, and
fgetc()
is not one of them. More generally, none of them are data-transfer functions. Therefore, if the error indicator is set for a stream before that stream is presented tofgetc()
for reading, then it should still be set when that function returns, all other considerations notwithstanding. That covers these cases:*1 0 0 Unexpected
1 1 0 Unexpected
1 1 1 Input error or end-of-fileIt also covers this case with respect to the expected value of the error indicator, though it does not speak to whether it can actually happen:
1 0 1 Normal reading of valid data with error indicator set!
fgetc()
is specified to returnEOF
in every situation in which it is specified to set the end-of-file indicator on a stream. Therefore, iffgetc()
returns anything other thanEOF
then it will not, on that call, have set the stream's error (or end-of-file) indicator. That covers these cases:0 0 0 Normal reading of valid data
0 0 1 UnexpectedOn the other hand, if
fgetc()
does returnEOF
then either the stream's end-of-file indicator or its error indicator should afterward be found set. But the standard distinguishes between these cases, and specifies that the user can distinguish them via thefeof()
andferror()
functions. That covers these cases:*0 1 0 End-of-file
0 1 1 Input errorFinally, I concur that none of the behavior of
fgetc()
is conditioned on the initial state of the stream's error indicator. Provided only that the stream is not initially positioned at its end, and its end-of-file indicator is not initially set, "thefgetc
function returns the next character from the input stream pointed to by stream." That establishes that this, the case of most interest, is in fact allowed:1 0 1 Normal reading of valid data with error indicator set!
However, that the case is allowed in the abstract does not imply that it can be observed in practice. The details seem unspecified, and I would expect them to depend on the implementation of the driver serving the stream in question. It is entirely possible that having once encountered an error, the driver will continue to report an error on subsequent reads until reset appropriately, and perhaps longer. From the C perspective, that would be interpreted as an (additional) error occurring on each subsequent read, and nothing in the language specifications prevents that. Not even use of one of the functions that clear a stream's error indicator.
If codes does not clear the error indicator before hand and wants to
detect if a line of input had a rare input error, it seems to make
sense to test!feof()
and notferror()
to detect.Is checking
ferror()
potentially misleading? or have I missed something about the error indicator?
I agree that if a stream's error indicator is initially set, its end-of-file indicator is not, and reading it with fgetc()
returns EOF
, then ferror()
does not usefully distinguish between the end-of-file and error cases whereas feof()
should.
On the other hand, whether one can usefully continue to read a given stream after an error has been encountered on it depends on implementation and possibly on specific circumstances. That applies even if the error indicator is cleared via a clearerr()
call, not to mention if the error indicator is not cleared.
* Although I agree that there is an ambiguity with respect to EOF
in the event that UCHAR_MAX > INT_MAX
, I assert that that is just one of several reasons why such an implementation would be problematic. As a practical matter, therefore, I disregard such implementations as entirely hypothetical.
Related Topics
C++ HTML Template Framework, Templatizing Library, HTML Generator Library
How to Hook Windows Functions in C/C++
Explain C++ Sfinae to a Non-C++ Programmer
I'Ve Heard I++ Isn't Thread Safe, Is ++I Thread-Safe
C++11 Lambda Implementation and Memory Model
Very Poor Boost::Lexical_Cast Performance
What Is the Meaning and Usage of _Stdcall
Should I Learn C Before Learning C++
How to Set Up Googletest as a Shared Library on Linux
How to Identify Platform/Compiler from Preprocessor MACros
What Do Each Memory_Order Mean
What Does Flushing the Buffer Mean
How to Append an Int to a String in C++
Function Pointer VS Function Reference
Linux Equivalent for Conio.H Getch()
In Either C or C++, Should I Check Pointer Parameters Against Null/Nullptr
Why Does -Int_Min = Int_Min in a Signed, Two's Complement Representation