Equivalent C++ to Python Generator Pattern

Equivalent C++ to Python generator pattern

Generators exist in C++, just under another name: Input Iterators. For example, reading from std::cin is similar to having a generator of char.

You simply need to understand what a generator does:

there is a blob of data: the local variables define a state
there is an init method
there is a "next" method
there is a way to signal termination

In your trivial example, it's easy enough. Conceptually:

struct State { unsigned i, j; };

State make();

void next(State&);

bool isDone(State const&);

Of course, we wrap this as a proper class:

class PairSequence:
    // (implicit aliases)
    public std::iterator<
        std::input_iterator_tag,
        std::pair<unsigned, unsigned>
    >
{
  // C++03
  typedef void (PairSequence::*BoolLike)();
  void non_comparable();
public:
  // C++11 (explicit aliases)
  using iterator_category = std::input_iterator_tag;
  using value_type = std::pair<unsigned, unsigned>;
  using reference = value_type const&;
  using pointer = value_type const*;
  using difference_type = ptrdiff_t;

  // C++03 (explicit aliases)
  typedef std::input_iterator_tag iterator_category;
  typedef std::pair<unsigned, unsigned> value_type;
  typedef value_type const& reference;
  typedef value_type const* pointer;
  typedef ptrdiff_t difference_type;

  PairSequence(): done(false) {}

  // C++11
  explicit operator bool() const { return !done; }

  // C++03
  // Safe Bool idiom
  operator BoolLike() const {
    return done ? 0 : &PairSequence::non_comparable;
  }

  reference operator*() const { return ij; }
  pointer operator->() const { return &ij; }

  PairSequence& operator++() {
    static unsigned const Max = std::numeric_limts<unsigned>::max();

    assert(!done);

    if (ij.second != Max) { ++ij.second; return *this; }
    if (ij.first != Max) { ij.second = 0; ++ij.first; return *this; }

    done = true;
    return *this;
  }

  PairSequence operator++(int) {
    PairSequence const tmp(*this);
    ++*this;
    return tmp;
  }

private:
  bool done;
  value_type ij;
};

So hum yeah... might be that C++ is a tad more verbose :)

Equivalent of a python generator in C++ for buffered reads

In order to disguise an ifstream (or really, any input stream) in a form that acts like an iterator, you want to use the istream_iterator or the istreambuf_iterator template class. The former is useful for files where the formatting is of concern. For example, a file full of whitespace-delimited integers can be read into the vector's iterator range constructor as follows:

#include <fstream>
#include <vector>
#include <iterator> // needed for istream_iterator

using namespace std;

int main(int argc, char** argv)
{
    ifstream infile("my-file.txt");

    // It isn't customary to declare these as standalone variables,
    // but see below for why it's necessary when working with
    // initializing containers.
    istream_iterator<int> infile_begin(infile);
    istream_iterator<int> infile_end;

    vector<int> my_ints(infile_begin, infile_end);

    // You can also do stuff with the istream_iterator objects directly:
    // Careful! If you run this program as is, this won't work because we
    // used up the input stream already with the vector.

    int total = 0;
    while (infile_begin != infile_end) {
        total += *infile_begin;
        ++infile_begin;
    }

    return 0;
}

istreambuf_iterator is used to read through files a single character at a time, disregarding the formatting of the input. That is, it will return you all characters, including spaces, newline characters, and so on. Depending on your application, that may be more appropriate.

Note: Scott Meyers explains in Effective STL why the separate variable declarations for istream_iterator are needed above. Normally, you would do something like this:

ifstream infile("my-file.txt");
vector<int> my_ints(istream_iterator<int>(infile), istream_iterator<int>());

However, C++ actually parses the second line in an incredibly bizarre way. It sees it as the declaration of a function named my_ints that takes in two parameters and returns a vector<int>. The first parameter is of type istream_iterator<int> and is named infile (the parantheses are ignored). The second parameter is a function pointer with no name that takes zero arguments (because of the parantheses) and returns an object of type istream_iterator<int>.

Pretty cool, but also pretty aggravating if you're not watching out for it.

EDIT

Here's an example using the istreambuf_iterator to read in a file of 64-bit numbers laid out end-to-end:

#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

int main(int argc, char** argv)
{
    ifstream input("my-file.txt");
    istreambuf_iterator<char> input_begin(input);
    istreambuf_iterator<char> input_end;

    // Fill a char vector with input file's contents:
    vector<char> char_input(input_begin, input_end);
    input.close();

    // Convert it to an array of unsigned long with a cast:
    unsigned long* converted = reinterpret_cast<unsigned long*>(&char_input[0]);
    size_t num_long_elements = char_input.size() * sizeof(char) / sizeof(unsigned long);

    // Put that information into a vector:
    vector<unsigned long> long_input(converted, converted + num_long_elements);

    return 0;
}

Now, I personally rather dislike this solution (using reinterpret_cast, exposing char_input's array), but I'm not familiar enough with istreambuf_iterator to comfortably use one templatized over 64-bit characters, which would make this much easier.

Making python generator via c++20 coroutines

You have essentially two problems to overcome if you want to do this.

The first is that C++ is a statically typed language. This means that the types of everything involved need to be known at compile time. This is why your generator type needs to be a template, so that the user can specify what type it shepherds from the coroutine to the caller.

So if you want to have this bi-directional interface, then something on your hello function must specify both the output type and the input type.

The simplest way to go about this is to just create an object and pass a non-const reference to that object to the generator. Each time it does a co_yield, the caller can modify the referenced object and then ask for a new value. The coroutine can read from the reference and see the given data.

However, if you insist on using the future type for the coroutine as both output and input, then you need to both solve the first problem (by making your generator template take OutputType and InputType) as well as this second problem.

See, your goal is to get a value to the coroutine. The problem is that the source of that value (the function calling your coroutine) has a future object. But the coroutine cannot access the future object. Nor can it access the promise object that the future references.

Or at least, it can't do so easily.

There are two ways to go about this, with different use cases. The first manipulates the coroutine machinery to backdoor a way into the promise. The second manipulates a property of co_yield to do basically the same thing.

Transform

The promise object for a coroutine is usually hidden and inaccessible from the coroutine. It is accessible to the future object, which the promise creates and which acts as an interface to the promised data. But it is also accessible during certain parts of the co_await machinery.

Specifically, when you perform a co_await on any expression in a coroutine, the machinery looks at your promise type to see if it has a function called await_transform. If so, it will call that promise object's await_transform on every expression you co_await on (at least, in a co_await that you directly write, not implicit awaits, such as the one created by co_yield).

As such, we need to do two things: create an overload of await_transform on the promise type, and create a type whose sole purpose is to allow us to call that await_transform function.

So that would look something like this:

struct generator_input {};

...

//Within the promise type:
auto await_transform(generator_input);

One quick note. The downside of using await_transform like this is that, by specifying even one overload of this function for our promise, we impact every co_await in any coroutine that uses this type. For a generator coroutine, that's not very important, since there's not much reason to co_await unless you're doing a hack like this. But if you were creating a more general mechanism that could distinctly await on arbitrary awaitables as part of its generation, you'd have a problem.

OK, so we have this await_transform function; what does this function need to do? It needs to return an awaitable object, since co_await is going to await on it. But the purpose of this awaitable object is to deliver a reference to the input type. Fortunately, the mechanism co_await uses to convert the awaitable into a value is provided by the awaitable's await_resume method. So ours can just return an InputType&:

//Within the `generator<OutputType, InputType>`:
    struct passthru_value
    {
        InputType &ret_;

        bool await_ready() {return true;}
        void await_suspend(coro_handle) {}
        InputType &await_resume() { return ret_; }
    };

//Within the promise type:
auto await_transform(generator_input)
{
    return passthru_value{input_value}; //Where `input_value` is the `InputType` object stored by the promise.
}

This gives the coroutine access to the value, by invoking co_await generator_input{};. Note that this returns a reference to the object.

The generator type can easily be modified to allow the ability to modify an InputType object stored in the promise. Simply add a pair of send functions for overwriting the input value:

void send(const InputType &input)
{
    coro.promise().input_value = input;
} 

void send(InputType &&input)
{
    coro.promise().input_value = std::move(input);
}

This represents an asymmetric transport mechanism. The coroutine retrieves a value at a place and time of its own choosing. As such, it is under no real obligation to respond instantly to any changes. This is good in some respects, as it allows a coroutine to insulate itself from deleterious changes. If you're using a range-based for loop over a container, that container cannot be directly modified (in most ways) by the outside world or else your program will exhibit UB. So if the coroutine is fragile in that way, it can copy the data from the user and thus prevent the user from modifying it.

All in all, the needed code isn't that large. Here's a run-able example of your code with these modifications:

#include <coroutine>
#include <exception>
#include <string>
#include <iostream>

struct generator_input {};

template <typename OutputType, typename InputType>
struct generator {
    struct promise_type;
    using coro_handle = std::coroutine_handle<promise_type>;

    struct passthru_value
    {
        InputType &ret_;

        bool await_ready() {return true;}
        void await_suspend(coro_handle) {}
        InputType &await_resume() { return ret_; }
    };

    struct promise_type {
        OutputType current_value;
        InputType input_value;

        auto get_return_object() { return generator{coro_handle::from_promise(*this)}; }
        auto initial_suspend() { return std::suspend_always{}; }
        auto final_suspend() { return std::suspend_always{}; }
        void unhandled_exception() { std::terminate(); }
        auto yield_value(OutputType value) {
            current_value = value;
            return std::suspend_always{};
        }

        void return_void() {}

        auto await_transform(generator_input)
        {
            return passthru_value{input_value};
        }
    };

    bool next() { return coro ? (coro.resume(), !coro.done()) : false; }
    OutputType value() { return coro.promise().current_value; }

    void send(const InputType &input)
    {
        coro.promise().input_value = input;
    } 

    void send(InputType &&input)
    {
        coro.promise().input_value = std::move(input);
    } 

    generator(generator const & rhs) = delete;
    generator(generator &&rhs)
        :coro(rhs.coro)
    {
        rhs.coro = nullptr;
    }
    ~generator() {
        if (coro)
            coro.destroy();
    }
private:
    generator(coro_handle h) : coro(h) {}
    coro_handle coro;
};

generator<char, std::string> hello(){
    auto word = co_await generator_input{};

    for(auto &ch: word){
        co_yield ch;
    }
}

int main(int, char**)
{
    auto test = hello();
    test.send("hello world");

    while(test.next())
    {
        std::cout << test.value() << ' ';
    }
}

Be more yielding

An alternative to using an explicit co_await is to exploit a property of co_yield. Namely, co_yield is an expression and therefore it has a value. Specifically, it is (mostly) equivalent to co_await p.yield_value(e), where p is the promise object (ohh!) and e is what we're yielding.

Fortunately, we already have a yield_value function; it returns std::suspend_always. But it could also return an object that always suspends, but also which co_await can unpack into an InputType&:

struct yield_thru
{
    InputType &ret_;

    bool await_ready() {return false;}
    void await_suspend(coro_handle) {}
    InputType &await_resume() { return ret_; }
};

...

//in the promise
auto yield_value(OutputType value) {
    current_value = value;
    return yield_thru{input_value};
}

This is a symmetric transport mechanism; for every value you yield, you receive a value (which may be the same one as before). Unlike the explicit co_await method, you can't receive a value before you start to generate them. This could be useful for certain interfaces.

And of course, you could combine them as you see fit.

yield keyword for C++, How to Return an Iterator from my Function?

No, that’s not what Jerry means, at least not directly.

yield in Python implements coroutines. C++ doesn’t have them (but they can of course be emulated but that’s a bit involved if done cleanly).

But what Jerry meant is simply that you should pass in an output iterator which is then written to:

template <typename O>
void process_data(pqxx::result const & input_data, O iter) {
    for (row = input_data.begin(); row != inpupt_data.end(); ++row)
        *iter++ = some_value;
}

And call it:

std::vector<result_data> result;
process_data(input, std::back_inserter(result));

I’m not convinced though that this is generally better than just returning the vector.

Translate Recursive Python List Generator to C++

First, you can check this thread for more information on your algorithm. You will discover than the number of lists you generate is (n+r-1)C(r-1), this may help. There is multiple ways to translate this code, but I'll give you two.

The iterative way

First, in C++, the generator pattern is not very common. Depending on what you want to do, most of the time you prefer to actually allocate the memory for all this output at inception, compute the date, then return the full matrix. Second, you can not recurse this way in C++, you will ruin your stack very quickly. So you need an iterative version of your algorithm. Here is how to do it (with iterators, as we like them in C++).

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <boost/math/special_functions/binomial.hpp>
#include <boost/numeric/conversion/cast.hpp>

using namespace std;

vector<vector<size_t>> gen_matrix(unsigned int n, unsigned int r)
{
    vector<vector<size_t>> res;
    if(r < 1) return res;
    // reserve memory space
    // this will throw positive_overflow if size is too big to be represented as size_t
    // this can also throw out_of_memory if this is size_t-representable but memory is too small.
    double result_size = boost::math::binomial_coefficient<double>(n + r - 1, r - 1);
    res.reserve(boost::numeric_cast<size_t>(result_size));
    vector<size_t> current(r, 0);
    current.front() = n;
    res.push_back(current);
    vector<size_t>::iterator inc = next(current.begin()); // what we increment
    while(inc != current.end())
    {
        while(current.front() != 0)
        {
            (*inc)++;
            current.front()--;
            res.push_back(current);
            while(prev(inc) != current.begin())
                inc--;
        }
        swap(current.front(), *inc++);
    }
    return move(res);
}

int main()
{
    auto r = gen_matrix(6, 4);
    for(auto v : r)
    {
        copy(v.begin(), v.end(), ostream_iterator<int>(cout, ", "));
        cout << endl;
    }
}

Note : The generation is made in reverse compared to your example, because this way is much more natural when using C++ containers (because of the iterator comparison with container end()). Also the boost part is used to pre-compute size and throw an exception early-on to avoid memory depletion (and reserve memory to avoid reallocations). This is not mandatory, you can as well comment this part out (at your own risks ^^).

The generator way

But you may need a generator, like if you're coding a program that will write Tera-octets of integer lists in big files saved on Peta-disks (well, who knows ?). Or you may want to be able to compute on n=100, r=80, skip 2 or 3 millions of vectors and then pick a bunch of them. Or you just want to avoid heavy memory usage. Well, anyway, a generator might come in handy; here it is.

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <stdexcept>
#include <boost/math/special_functions/binomial.hpp>
#include <boost/numeric/conversion/cast.hpp>

struct sum_list_generator
{
    typedef vector<unsigned int> result_type;

    sum_list_generator(unsigned int n, unsigned int r):
        current(r, 0),
        inc(current.begin())
    {
        if(inc != current.end()) *inc++ = n;
    }

    result_type operator()()
    {
        if(inc == current.end())
            throw out_of_range("end of iteration");
        result_type res = current;
        if(current.front() == 0)
            swap(current.front(), *inc++);
        if(inc != current.end())
        {
            (*inc)++;
            current.front()--;
            if(current.front() != 0)
                while(prev(inc) != current.begin())
                    inc--;
        }
        return move(res);
    }

    // helper function : number of outputed vectors
    static size_t count(unsigned int n, unsigned int r)
    {
        return boost::numeric_cast<size_t>(
            boost::math::binomial_coefficient<double>(n + r - 1, r - 1)
            );
    }

    private:
        result_type current;
        result_type::iterator inc;
};

int main()
{
    sum_list_generator g(6, 4);
    try
    {
        while(true)
        {
            auto v = g();
            copy(v.begin(), v.end(), ostream_iterator<int>(cout, ", "));
            cout << endl;
        }
    }
    catch(out_of_range const&) {}
}

Note: Again the count member function can be erased. Also, you usually avoid to throw an exception on an expected execution path in C++ (as opposed to python). Here the generator can be used to fill some other structure and, if your parameters are well-chosen, it will not throw. If you try to use it too much, of course it throws an out-of-range. Ultimately, catching the exception and silencing it like here in the main is very bad design - this is just an example you can use to try some fun parameters like (100, 80). The count() function gives you exact boundaries if you need a full list of vectors : use it.

Generator list with c++ standard library?

Algorithms won't give you the index while scanning over the range. You can use boost.Iterator (or Boost.range) to help you to write your iterator (either [boost::iterator_facade][1] or [boost::function_input_iterator][2]for instance)

Is it possible to implement Python yield functionality in freestanding C?

I'm going to answer using setjmp and longjmp since those interfaces are standard and you can easily find their implementations for any HW platform. They are freestanding, but HW dependent.

struct _yield_state {
    jmp_buf buf;
    _Bool yielded;
};

#define yieldable  static struct _yield_state _state; \
                   if (_state.yielded) longjmp(_state.buf, 1); else {}

#define yield(x)  if (setjmp(_state.buf)) { _state.yielded = false;          }\
                  else                    { _state.yielded = true;  return x }

int func(int a, int b)
{
    yieldable;

    if (a > b)
        yield(0);

    return a + b;
}

You can find an example setjmp and longjmp implementation here. It is pure assembly, specific only to the underlying hardware.

Python generators in various languages

Here is an example in C++ that simulates generators using fibers:

Yield Return Iterator for Native C++ Using Fibers

The "yield return" iterator is a
language feature that was created for
one reason: simplicity. It is
generally much easier to iterate
across whole collectionl, storing all
context needed in local variables,
rather than crafting a complicated,
custom iterator object that stores its
state across subsequent retrieval
operations.

There are also the primitive C routines setjmp, longjmp to achieve similar results.

(Lua coroutines are implemented with the above method)