What's the Real Reason to Not Use the Eof Bit as Our Stream Extraction Condition

Why does string extraction from a stream set the eof bit?

std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).

27.7.2.1 Class template basic_istream

2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.

Also, "extracting" means calling these two functions.

3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.

So eof must be set.

Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?

Because iostream::eof will only return true after reading the end of the stream. It does not indicate, that the next read will be the end of the stream.

Consider this (and assume then next read will be at the end of the stream):

while(!inStream.eof()){
int data;
// yay, not end of stream yet, now read ...
inStream >> data;
// oh crap, now we read the end and *only* now the eof bit will be set (as well as the fail bit)
// do stuff with (now uninitialized) data
}

Against this:

int data;
while(inStream >> data){
// when we land here, we can be sure that the read was successful.
// if it wasn't, the returned stream from operator>> would be converted to false
// and the loop wouldn't even be entered
// do stuff with correctly initialized data (hopefully)
}

And on your second question: Because

if(scanf("...",...)!=EOF)

is the same as

if(!(inStream >> data).eof())

and not the same as

if(!inStream.eof())
inFile >> data

How does ifstream's eof() work?

-1 is get's way of saying you've reached the end of file. Compare it using the std::char_traits<char>::eof() (or std::istream::traits_type::eof()) - avoid -1, it's a magic number. (Although the other one is a bit verbose - you can always just call istream::eof)

The EOF flag is only set once a read tries to read past the end of the file. If I have a 3 byte file, and I only read 3 bytes, EOF is false, because I've not tried to read past the end of the file yet. While this seems confusing for files, which typically know their size, EOF is not known until a read is attempted on some devices, such as pipes and network sockets.

The second example works as inf >> foo will always return inf, with the side effect of attempt to read something and store it in foo. inf, in an if or while, will evaluate to true if the file is "good": no errors, no EOF. Thus, when a read fails, inf evaulates to false, and your loop properly aborts. However, take this common error:

while(!inf.eof())  // EOF is false here
{
inf >> x; // read fails, EOF becomes true, x is not set
// use x // we use x, despite our read failing.
}

However, this:

while(inf >> x)  // Attempt read into x, return false if it fails
{
// will only be entered if read succeeded.
}

Which is what we want.

istream::peek curious behavior wrt. EOF

in.get();      // get a char (which turns out to be the last)
// curiously ios::eof bit doesn't get set just yet

This is not "curious". The stream's EOF bit gets set when a read fails due to having reached eof; it does not mean "the last read took us to eof".

c = in.peek(); // attempt to peek, return EOF and now set ios::eof bit

Like now.

Also, why unget subsequently doesn't work? Does the standard mandates all operations to be nop when good() is false or something?

... which is what's happening. You failed to otherwise define "doesn't work".

You'll have to clear the stream's state yourself when EOF was reached, if you want to unget that character you retrieved on line 3.

istream in;
// ...
in.get(); // assume this succeeds (*)

c = in.peek(); // assume this fails and sets EOF bit

if (!in) {
in.clear(); // clear stream error state for use
in.unget(); // "put back" that character (*)
}
else {
// next char from .get() will be equal to `c`
}

Why is failbit set when I enter EOF?

eofbit is set when a read operation encounters EOF while reading data into the stream's buffer. The data hasn't been processed yet.

failbit is set when the requested data fails to be extracted from the buffer, such as when reading an integer with operator>>. While waiting for digits to arrive, EOF could occur. eofbit alone is not enough to enter an error state, as there may be usable data in the buffer.

So, for example, imagine a while (cin >> num) loop is used and the user enters 123<Ctrl-Z>.

  • on the 1st iteration, operator>> reads 1, 2, 3 into the buffer, then encounters Ctrl-Z, so it sets eofbit and stops reading. 123 is then extracted from the buffer into num and the operator exits. At this point, the stream is not yet in an error state. When the stream's bool conversion is evaluated by while, it returns true, allowing the while body to be entered so it can process num.

  • on the next iteration, operator>> sees eofbit is set, preventing further reading. There is nothing left in the buffer to extract into num, so the operator sets failbit and exits. The stream is now in an error state. When the stream's bool conversion is evaluated by while, it returns false, breaking the while loop.

Why is failbit set when eof is found on read?

Improving @absence's answer, it follows a method readeof() that does the same of read() but doesn't set failbit on EOF. Also real read failures have been tested, like an interrupted transfer by hard removal of a USB stick or link drop in a network share access. It has been tested on Windows 7 with VS2010 and VS2013 and on linux with gcc 4.8.1. On linux only USB stick removal has been tried.

#include <iostream>
#include <fstream>
#include <stdexcept>

using namespace std;

streamsize readeof(istream &stream, char *buffer, streamsize count)
{
if (count == 0 || stream.eof())
return 0;

streamsize offset = 0;
streamsize reads;
do
{
// This consistently fails on gcc (linux) 4.8.1 with failbit set on read
// failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
reads = stream.rdbuf()->sgetn(buffer + offset, count);

// This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
// failure of the previous sgetn()
(void)stream.rdstate();

// On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
// sets eofbit when stream is EOF for the conseguences of sgetn(). It
// should also throw if exceptions are set, or return on the contrary,
// and previous rdstate() restored a failbit on Windows. On Windows most
// of the times it sets eofbit even on real read failure
(void)stream.peek();

if (stream.fail())
throw runtime_error("Stream I/O error while reading");

offset += reads;
count -= reads;
} while (count != 0 && !stream.eof());

return offset;
}

#define BIGGER_BUFFER_SIZE 200000000

int main(int argc, char* argv[])
{
ifstream stream;
stream.exceptions(ifstream::badbit | ifstream::failbit);
stream.open("<big file on usb stick>", ios::binary);

char *buffer = new char[BIGGER_BUFFER_SIZE];

streamsize reads = readeof(stream, buffer, BIGGER_BUFFER_SIZE);

if (stream.eof())
cout << "eof" << endl << flush;

delete buffer;

return 0;
}

Bottom line: on linux the behavior is more consistent and meaningful. With exceptions enabled on real read failures it will throw on sgetn(). On the contrary Windows will treat read failures as EOF most of the times.

Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?

The eof() flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof() is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.

The semantics of eof() is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof() to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof().

If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:

if (stream.peek() != std::char_traits<char>::eof()) {
do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
do_something_else();
}


Related Topics



Leave a reply



Submit