Why does string extraction from a stream set the eof bit?
std::stringstream
is a basic_istream
and the operator>>
of std::string
"extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?
Because iostream::eof
will only return true
after reading the end of the stream. It does not indicate, that the next read will be the end of the stream.
Consider this (and assume then next read will be at the end of the stream):
while(!inStream.eof()){
int data;
// yay, not end of stream yet, now read ...
inStream >> data;
// oh crap, now we read the end and *only* now the eof bit will be set (as well as the fail bit)
// do stuff with (now uninitialized) data
}
Against this:
int data;
while(inStream >> data){
// when we land here, we can be sure that the read was successful.
// if it wasn't, the returned stream from operator>> would be converted to false
// and the loop wouldn't even be entered
// do stuff with correctly initialized data (hopefully)
}
And on your second question: Because
if(scanf("...",...)!=EOF)
is the same as
if(!(inStream >> data).eof())
and not the same as
if(!inStream.eof())
inFile >> data
How does ifstream's eof() work?
-1 is get
's way of saying you've reached the end of file. Compare it using the std::char_traits<char>::eof()
(or std::istream::traits_type::eof()
) - avoid -1, it's a magic number. (Although the other one is a bit verbose - you can always just call istream::eof
)
The EOF flag is only set once a read tries to read past the end of the file. If I have a 3 byte file, and I only read 3 bytes, EOF is false
, because I've not tried to read past the end of the file yet. While this seems confusing for files, which typically know their size, EOF is not known until a read is attempted on some devices, such as pipes and network sockets.
The second example works as inf >> foo
will always return inf
, with the side effect of attempt to read something and store it in foo
. inf
, in an if
or while
, will evaluate to true
if the file is "good": no errors, no EOF. Thus, when a read fails, inf
evaulates to false
, and your loop properly aborts. However, take this common error:
while(!inf.eof()) // EOF is false here
{
inf >> x; // read fails, EOF becomes true, x is not set
// use x // we use x, despite our read failing.
}
However, this:
while(inf >> x) // Attempt read into x, return false if it fails
{
// will only be entered if read succeeded.
}
Which is what we want.
istream::peek curious behavior wrt. EOF
in.get(); // get a char (which turns out to be the last)
// curiously ios::eof bit doesn't get set just yet
This is not "curious". The stream's EOF bit gets set when a read fails due to having reached eof; it does not mean "the last read took us to eof".
c = in.peek(); // attempt to peek, return EOF and now set ios::eof bit
Like now.
Also, why
unget
subsequently doesn't work? Does the standard mandates all operations to be nop whengood()
is false or something?
... which is what's happening. You failed to otherwise define "doesn't work".
You'll have to clear the stream's state yourself when EOF was reached, if you want to unget
that character you retrieved on line 3.
istream in;
// ...
in.get(); // assume this succeeds (*)
c = in.peek(); // assume this fails and sets EOF bit
if (!in) {
in.clear(); // clear stream error state for use
in.unget(); // "put back" that character (*)
}
else {
// next char from .get() will be equal to `c`
}
Why is failbit set when I enter EOF?
eofbit
is set when a read operation encounters EOF while reading data into the stream's buffer. The data hasn't been processed yet.
failbit
is set when the requested data fails to be extracted from the buffer, such as when reading an integer with operator>>
. While waiting for digits to arrive, EOF could occur. eofbit
alone is not enough to enter an error state, as there may be usable data in the buffer.
So, for example, imagine a while (cin >> num)
loop is used and the user enters 123<Ctrl-Z>
.
on the 1st iteration,
operator>>
reads1
,2
,3
into the buffer, then encounters Ctrl-Z, so it setseofbit
and stops reading.123
is then extracted from the buffer intonum
and the operator exits. At this point, the stream is not yet in an error state. When the stream'sbool
conversion is evaluated bywhile
, it returns true, allowing thewhile
body to be entered so it can processnum
.on the next iteration,
operator>>
seeseofbit
is set, preventing further reading. There is nothing left in the buffer to extract intonum
, so the operator setsfailbit
and exits. The stream is now in an error state. When the stream'sbool
conversion is evaluated bywhile
, it returns false, breaking thewhile
loop.
Why is failbit set when eof is found on read?
Improving @absence's answer, it follows a method readeof()
that does the same of read()
but doesn't set failbit on EOF. Also real read failures have been tested, like an interrupted transfer by hard removal of a USB stick or link drop in a network share access. It has been tested on Windows 7 with VS2010 and VS2013 and on linux with gcc 4.8.1. On linux only USB stick removal has been tried.
#include <iostream>
#include <fstream>
#include <stdexcept>
using namespace std;
streamsize readeof(istream &stream, char *buffer, streamsize count)
{
if (count == 0 || stream.eof())
return 0;
streamsize offset = 0;
streamsize reads;
do
{
// This consistently fails on gcc (linux) 4.8.1 with failbit set on read
// failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
reads = stream.rdbuf()->sgetn(buffer + offset, count);
// This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
// failure of the previous sgetn()
(void)stream.rdstate();
// On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
// sets eofbit when stream is EOF for the conseguences of sgetn(). It
// should also throw if exceptions are set, or return on the contrary,
// and previous rdstate() restored a failbit on Windows. On Windows most
// of the times it sets eofbit even on real read failure
(void)stream.peek();
if (stream.fail())
throw runtime_error("Stream I/O error while reading");
offset += reads;
count -= reads;
} while (count != 0 && !stream.eof());
return offset;
}
#define BIGGER_BUFFER_SIZE 200000000
int main(int argc, char* argv[])
{
ifstream stream;
stream.exceptions(ifstream::badbit | ifstream::failbit);
stream.open("<big file on usb stick>", ios::binary);
char *buffer = new char[BIGGER_BUFFER_SIZE];
streamsize reads = readeof(stream, buffer, BIGGER_BUFFER_SIZE);
if (stream.eof())
cout << "eof" << endl << flush;
delete buffer;
return 0;
}
Bottom line: on linux the behavior is more consistent and meaningful. With exceptions enabled on real read failures it will throw on sgetn()
. On the contrary Windows will treat read failures as EOF most of the times.
Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?
The eof()
flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof()
is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.
The semantics of eof()
is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof()
to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof()
.
If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:
if (stream.peek() != std::char_traits<char>::eof()) {
do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
do_something_else();
}
Related Topics
How to Compare Two Character Strings Statically at Compile Time
Borderless Window with Drop Shadow
When Pass a Variable to a Function, Why the Function Only Gets a Duplicate of the Variable
Static Variable in the Class Declaration or Definition
When to Use C++ Private Inheritance Over Composition
C++ Convert from Lpctstr to Const Char *
C++ Double Dispatch for Equals()
Application Has Failed to Start Because Msvcp100D.Dll Was Not Found, Reinstalling App May Work
Why Use #Ifndef Class_H and #Define Class_H in .H File But Not in .Cpp
What Does an Object Look Like in Memory
Error: Cannot Convert 'Const Wchar_T [13]' to 'Lpcstr {Aka Const Char*}' in Assignment
Overloading Operator<< for a Templated Class
Why Does the Compiler Choose Bool Over String for Implicit Typecast of L""
Std::Async Won't Spawn a New Thread When Return Value Is Not Stored
Why Does C++ Parameter Scope Affect Function Lookup Within a Namespace