Std::Istream_Iterator<> with Copy_N() and Friends

std::istream_iterator with copy_n() and friends

Unfortunately the implementer of copy_n has failed to account for the read ahead in the copy loop. The Visual C++ implementation works as you expect on both stringstream and std::cin. I also checked the case from the original example where the istream_iterator is constructed in line.

Here is the key piece of code from the STL implementation.

template<class _InIt,
class _Diff,
class _OutIt> inline
_OutIt _Copy_n(_InIt _First, _Diff _Count,
_OutIt _Dest, input_iterator_tag)
{ // copy [_First, _First + _Count) to [_Dest, ...), arbitrary input
*_Dest = *_First; // 0 < _Count has been guaranteed
while (0 < --_Count)
*++_Dest = *++_First;
return (++_Dest);
}

Here is the test code

#include <iostream>
#include <istream>
#include <sstream>
#include <vector>
#include <iterator>

int _tmain(int argc, _TCHAR* argv[])
{
std::stringstream ss;
ss << 1 << ' ' << 2 << ' ' << 3 << ' ' << 4 << std::endl;
ss.seekg(0);
std::vector<int> numbers(2);
std::istream_iterator<int> ii(ss);
std::cout << *ii << std::endl; // shows that read ahead happened.
std::copy_n(ii, 2, numbers.begin());
int i = 0;
ss >> i;
std::cout << numbers[0] << ' ' << numbers[1] << ' ' << i << std::endl;

std::istream_iterator<int> ii2(std::cin);
std::cout << *ii2 << std::endl; // shows that read ahead happened.
std::copy_n(ii2, 2, numbers.begin());
std::cin >> i;
std::cout << numbers[0] << ' ' << numbers[1] << ' ' << i << std::endl;

return 0;
}

/* Output
1
1 2 3
4 5 6
4
4 5 6
*/

Why std::istream_iterator with multiple copy_n() always writes firs value

This comes down to a perhaps unintuitive fact of istream_iterator: it doesn't read when you dereference it, but instead when you advance (or construct) it.



(x indicates a read)

Normal forward iterators:

Data: 1 2 3 (EOF)

Construction
*it x
++it
*it x
++it
*it x
++it (`it` is now the one-past-the-end iterator)
Destruction

Stream iterators:

Data: 1 2 3 (EOF)

Construction x
*it
++it x
*it
++it x
*it
++it (`it` is now the one-past-the-end iterator)
Destruction

We still expect the data to be provided to us via *it. So, to make this work, each bit of read data has to be temporarily stored in the iterator itself until we next do *it.

So, when you create iit, it's already pulling the first number out for you, 1. That data is stored in the iterator. The next available data in the stream is 2, which you then pull out using copy_n. In total that's two pieces of information delivered, out of a total of two that you asked for, so the first copy_n is done.

The next time, you're using a copy of iit in the state it was in before the first copy_n. So, although the stream is ready to give you 3, you still have a copy of that 1 "stuck" in your copied stream iterator.


Why do stream iterators work this way? Because you cannot detect EOF on a stream until you've tried and failed to obtain more data. If it didn't work this way, you'd have to do a dereference first to trigger this detection, and then what should the result be if we've reached EOF?

Furthermore, we expect that any dereference operation produces an immediate result; with a container that's a given, but with streams you could otherwise be blocking waiting for data to become available. It makes more logical sense to do this blocking on the construction/increment, instead, so that your iterator is always either valid, or it isn't.


If you sack off the copies, and construct a fresh stream iterator for each copy_n, you should be fine. Though I would generally recommend only using one stream iterator per stream, as that'll avoid anyone having to worry about this.

Limiting the range for std::copy with std::istream_iterator

As you requested a non-C++0x solution, here's an alternative that uses std::generate_n and a generator functor rather than std::copy_n and iterators:

#include <algorithm>
#include <string>
#include <istream>
#include <ostream>
#include <iostream>

template<
typename ResultT,
typename CharT = char,
typename CharTraitsT = std::char_traits<CharT>
>
struct input_generator
{
typedef ResultT result_type;

explicit input_generator(std::basic_istream<CharT, CharTraitsT>& input)
: input_(&input)
{ }

ResultT operator ()() const
{
// value-initialize so primitives like float
// have a defined value if extraction fails
ResultT v((ResultT()));
*input_ >> v;
return v;
}

private:
std::basic_istream<CharT, CharTraitsT>* input_;
};

template<typename ResultT, typename CharT, typename CharTraitsT>
inline input_generator<ResultT, CharT, CharTraitsT> make_input_generator(
std::basic_istream<CharT, CharTraitsT>& input
)
{
return input_generator<ResultT, CharT, CharTraitsT>(input);
}

int main()
{
float values[4];
std::generate_n(values, 4, make_input_generator<float>(std::cin));
std::cout << "Read exactly 4 floats" << std::endl;
}

If you wanted to, you could then use this generator in conjunction with boost::generator_iterator to use the generator as an input iterator.

Limiting the range for std::copy with std::istream_iterator

As you requested a non-C++0x solution, here's an alternative that uses std::generate_n and a generator functor rather than std::copy_n and iterators:

#include <algorithm>
#include <string>
#include <istream>
#include <ostream>
#include <iostream>

template<
typename ResultT,
typename CharT = char,
typename CharTraitsT = std::char_traits<CharT>
>
struct input_generator
{
typedef ResultT result_type;

explicit input_generator(std::basic_istream<CharT, CharTraitsT>& input)
: input_(&input)
{ }

ResultT operator ()() const
{
// value-initialize so primitives like float
// have a defined value if extraction fails
ResultT v((ResultT()));
*input_ >> v;
return v;
}

private:
std::basic_istream<CharT, CharTraitsT>* input_;
};

template<typename ResultT, typename CharT, typename CharTraitsT>
inline input_generator<ResultT, CharT, CharTraitsT> make_input_generator(
std::basic_istream<CharT, CharTraitsT>& input
)
{
return input_generator<ResultT, CharT, CharTraitsT>(input);
}

int main()
{
float values[4];
std::generate_n(values, 4, make_input_generator<float>(std::cin));
std::cout << "Read exactly 4 floats" << std::endl;
}

If you wanted to, you could then use this generator in conjunction with boost::generator_iterator to use the generator as an input iterator.

C++ best way to split vector into n vector

Best is a matter of opinion, but you could do something like the following (with bunch_size being 10):

for(size_t i = 0; i < strings.size(); i += bunch_size) {
auto last = std::min(strings.size(), i + bunch_size);
bunches.emplace_back(strings.begin() + i, strings.begin() + last);
}

demo

If your strings are large and you want to avoid copying, you can go with the move version:

for(size_t i = 0; i < strings.size(); i += bunch_size) {
auto last = std::min(strings.size(), i + bunch_size);
auto index = i / bunch_size;
auto& vec = bunches[index];
vec.reserve(last - i);
move(strings.begin() + i, strings.begin() + last, back_inserter(vec));
}

demo

Error message: 'value_type' : is not a member of

According to the standard (24.5.2.1 [back.insert.iterator]), back_insert_iterator requires that your Container type contain a value_type typedef, which should name the base type of the (const reference or rvalue reference) argument to push_back:

class TextFileLineBuffer
{
public:
// ...
typedef std::string value_type;

For compatibility with C++98, you should also define const_reference, per std::back_inserter needs const_reference on older GCC. Why?:

    typedef const std::string &const_reference;


Related Topics



Leave a reply



Submit