Flattening Iterator

Flattening iterator

I don't know of any implementation in a major library, but it looked like an interesting problem so I wrote a basic implementation. I've only tested it with the test case I present here, so I don't recommend using it without further testing.

The problem is a bit trickier than it looks because some of the "inner" containers may be empty and you have to skip over them. This means that advancing the flattening_iterator by one position may actually advance the iterator into the "outer" container by more than one position. Because of this, the flattening_iterator needs to know where the end of the outer range is so that it knows when it needs to stop.

This implementation is a forward iterator. A bidirectional iterator would also need to keep track of the beginning of the outer range. The flatten function templates are used to make constructing flattening_iterators a bit easier.

#include <iterator>

// A forward iterator that "flattens" a container of containers.  For example,
// a vector<vector<int>> containing { { 1, 2, 3 }, { 4, 5, 6 } } is iterated as
// a single range, { 1, 2, 3, 4, 5, 6 }.
template <typename OuterIterator>
class flattening_iterator
{
public:

    typedef OuterIterator                                outer_iterator;
    typedef typename OuterIterator::value_type::iterator inner_iterator;

    typedef std::forward_iterator_tag                iterator_category;
    typedef typename inner_iterator::value_type      value_type;
    typedef typename inner_iterator::difference_type difference_type;
    typedef typename inner_iterator::pointer         pointer;
    typedef typename inner_iterator::reference       reference;

    flattening_iterator() { }
    flattening_iterator(outer_iterator it) : outer_it_(it), outer_end_(it) { }
    flattening_iterator(outer_iterator it, outer_iterator end) 
        : outer_it_(it), 
          outer_end_(end)
    { 
        if (outer_it_ == outer_end_) { return; }

        inner_it_ = outer_it_->begin();
        advance_past_empty_inner_containers();
    }

    reference operator*()  const { return *inner_it_;  }
    pointer   operator->() const { return &*inner_it_; }

    flattening_iterator& operator++()
    {
        ++inner_it_;
        if (inner_it_ == outer_it_->end())
            advance_past_empty_inner_containers();
        return *this;
    }

    flattening_iterator operator++(int)
    {
        flattening_iterator it(*this);
        ++*this;
        return it;
    }

    friend bool operator==(const flattening_iterator& a, 
                           const flattening_iterator& b)
    {
        if (a.outer_it_ != b.outer_it_)
            return false;

        if (a.outer_it_ != a.outer_end_ && 
            b.outer_it_ != b.outer_end_ &&
            a.inner_it_ != b.inner_it_)
            return false;

        return true;
    }

    friend bool operator!=(const flattening_iterator& a,
                           const flattening_iterator& b)
    {
        return !(a == b);
    }

private:

    void advance_past_empty_inner_containers()
    {
        while (outer_it_ != outer_end_ && inner_it_ == outer_it_->end())
        {
            ++outer_it_;
            if (outer_it_ != outer_end_) 
                inner_it_ = outer_it_->begin();
        }
    }

    outer_iterator outer_it_;
    outer_iterator outer_end_;
    inner_iterator inner_it_;
};

template <typename Iterator>
flattening_iterator<Iterator> flatten(Iterator it)
{
    return flattening_iterator<Iterator>(it, it);
}

template <typename Iterator>
flattening_iterator<Iterator> flatten(Iterator first, Iterator last)
{
    return flattening_iterator<Iterator>(first, last);
}

The following is a minimal test stub:

#include <algorithm>
#include <iostream>
#include <set>
#include <vector>

int main()
{
    // Generate some test data:  it looks like this:
    // { { 0, 1, 2, 3 }, { 4, 5, 6, 7 }, { 8, 9, 10, 11 } }
    std::vector<std::vector<int>> v(3);
    int i(0);
    for (auto it(v.begin()); it != v.end(); ++it)
    {
        it->push_back(i++); it->push_back(i++);
        it->push_back(i++); it->push_back(i++);
    }

    // Flatten the data and print all the elements:
    for (auto it(flatten(v.begin(), v.end())); it != v.end(); ++it)
    {
        std::cout << *it << ", ";
    }
    std::cout << "\n";

    // Or, since the standard library algorithms are awesome:
    std::copy(flatten(v.begin(), v.end()), flatten(v.end()), 
              std::ostream_iterator<int>(std::cout, ", "));
}

Like I said at the beginning, I haven't tested this thoroughly. Let me know if you find any bugs and I'll be happy to correct them.

What does it mean to flatten an iterator?

In this context, to flatten means to remove nesting. For instance, an array of arrays (an array where each element is an array) of integers is nested; if we flatten it we get an array of integers which contains the same values in the same order, but next to each other in a single array, rather than split into several arrays: [[1 2] [3 4]] -> [1 2 3 4]. Same difference with iterators, other collections, and deeper nesting (array of array of sets of iterators of strings).

As for idioms, there aren't really many -- it's not a common task, and often simple. Note that in the case of regular arrays (all nested arrays are of the same size), nested[i][j] is equivalent to nested[i * INNER_ARRAY_SIZE + j]. This is sometimes used to avoid nesting, especially in languages which treat arrays as reference types and thus require many separately-allocated arrays if you nest them. In Python, you can flatten iterables with itertools.chain(*iterable_of_iterables).

How to flatten iterators of nested containers?

Ok, so this isn't a full solution - but I ran out of time. So this currently implements not a full iterator but a cut down iterator-like class which defines something like this interface, and requires C++11. I've tested it on g++4.7:

template<typename NestedContainerType, typename Terminator>
class flatten_iterator
{
    bool complete();
    void advance();
    Terminator& current();
};

Where NestedContainerType is the nested container type (surprisingly), and Terminator is the type of the innermost thing that you're wanting to get out of the flatten.

The code below works, but this is certainly not extensively tested. Wrapping it up fully (assuming you're happy with forward advance only) shouldn't be too much work, in particular if you use boost::iterator_facade.

#include <list>
#include <deque>
#include <vector>

#include <iostream>

template<typename ContainerType, typename Terminator>
class flatten_iterator
{
public:
    typedef flatten_iterator<typename ContainerType::value_type, Terminator> inner_it_type;
    typedef typename inner_it_type::value_type value_type;

    flatten_iterator() {}
    
    flatten_iterator( ContainerType& container ) : m_it( container.begin() ), m_end( container.end() )
    {
        skipEmpties();
    }
    
    bool complete()
    {
        return m_it == m_end;
    }
    
    value_type& current()
    {
        return m_inner_it.current();
    }
    
    void advance()
    {
        if ( !m_inner_it.complete() )
        {
            m_inner_it.advance();
        }
        if ( m_inner_it.complete() )
        {
            ++m_it;
            skipEmpties();
        }
    }
    
private:
    void skipEmpties()
    {
        while ( !complete() )
        {
            m_inner_it = inner_it_type(*m_it);
            if ( !m_inner_it.complete() ) break;
            ++m_it;
        }
    }

private:
    inner_it_type                    m_inner_it;
    typename ContainerType::iterator m_it, m_end;
};


template<template<typename, typename ...> class ContainerType, typename Terminator, typename ... Args>
class flatten_iterator<ContainerType<Terminator, Args...>, Terminator>
{
public:
    typedef typename ContainerType<Terminator, Args...>::value_type value_type;
    
public:
    flatten_iterator() {}
    
    flatten_iterator( ContainerType<Terminator, Args...>& container ) :
        m_it( container.begin() ), m_end( container.end() )
    {
    }
    
    bool complete()
    {
        return m_it == m_end;
    }
    
    value_type& current() { return *m_it; }
    void advance() { ++m_it; }
    
private:
    typename ContainerType<Terminator, Args...>::iterator m_it, m_end;
};

And with the following test cases, it does what you would expect:

int main( int argc, char* argv[] )
{   
    typedef std::vector<int> n1_t;
    typedef std::vector<std::deque<short> > n2_t;
    typedef std::list<std::vector<std::vector<std::vector<double> > > > n4_t;
    typedef std::vector<std::deque<std::vector<std::deque<std::vector<std::list<float> > > > > > n6_t;
    
    n1_t n1 = { 1, 2, 3, 4 };
    n2_t n2 = { {}, { 1, 2 }, {3}, {}, {4}, {}, {} };
    n4_t n4 = { { { {1.0}, {},  {}, {2.0}, {} }, { {}, {} }, { {3.0} } }, { { { 4.0 } } } };
    n6_t n6 = { { { { { {1.0f}, {},  {}, {2.0f}, {} }, { {}, {} }, { {3.0f} } }, { { { 4.0f } } } } } };
    
    flatten_iterator<n1_t, int> i1( n1 );
    while ( !i1.complete() )
    {
        std::cout << i1.current() << std::endl;
        i1.advance();
    }
    
    flatten_iterator<n2_t, short> i2( n2 );
    while ( !i2.complete() )
    {
        std::cout << i2.current() << std::endl;
        i2.advance();
    }
    
    flatten_iterator<n4_t, double> i4( n4 );
    while ( !i4.complete() )
    {
        std::cout << i4.current() << std::endl;
        i4.advance();
    }
    
    flatten_iterator<n6_t, float> i6( n6 );
    while ( !i6.complete() )
    {
        std::cout << i6.current() << std::endl;
        i6.advance();
    }
}

So prints the following for each of the container types:

Note that it doesn't yet work with sets because there's some foo required to deal with the fact that set iterators return const references. Exercise for the reader... :-)

How to flatten iterator that uses references

This is problematic. Because the IntoIterator impls borrow self you need to hold both the iterable and the iterator together, and that creates a self-referential struct. See Why can't I store a value and a reference to that value in the same struct?.

It looks to me, even though I haven't digged deep, that this is not necessary and this is actually the result of a wrong design of midasio. But you can't do much regarding that, other than patching the library or sending a PR and hoping for it to be accepted soon (if you want to change that, I think it is enough to change the &'a FileView<'a> and &'a EventView<'a> to &'_ FileView<'a> and &'_ EventView<'a> respectively, though I'm unsure).https://github.com/DJDuque/midasio/pull/8

I don't think there is a good solution. Using iterator adapters is unlikely to work, and creating your own iterator type will require unsafe code or at the very least using a crate like ouroboros.

Edit: With my PR #8, it still doesn't work verbatim because the Mmap is dropped at the end of the map() but you still need to access it, however this is fixable pretty easily by collecting all Mmaps into a Vec:

fn main() {
    let args: Vec<PathBuf> = Vec::new();
    let mmaps = args
        .iter()
        .map(|path| {
            let file = File::open(path).unwrap();
            unsafe { Mmap::map(&file).unwrap() }
            
        })
        .collect::<Vec<_>>();
    let iterator = mmaps
        .iter()
        .map(|mmap| FileView::try_from(&mmap[..]).unwrap())
        .flat_map(|file_view| file_view.into_iter())
        .flat_map(|event_view| event_view.into_iter());
}

Returning this iterator from a function is still not going to work, unfortunately.

How do I flatten an iterator of Result<Vec<T>,E> to return an iterator of Result<T,E> in rust?

While in the general case I would use the either crate, in this specific case itertools has gotten you covered with flatten_ok():

use itertools::Itertools;

let flattened_iter1 = iter1.map(Result::as_ref).flatten_ok();

Flatten tuple from iterator in Python

You can do it like this:

>>> for (index_x, item_x), (index_y, item_y) in itertools.combinations(enumerate(mylist), 2):
...     print(f"index_x: {index_x}, item_x: {item_x}, index_y: {index_y}, item_y: {item_y}")
    
index_x: 0, item_x: a, index_y: 1, item_y: b
index_x: 0, item_x: a, index_y: 2, item_y: c
index_x: 1, item_x: b, index_y: 2, item_y: c

How do I flatten a recursive structure using recursive iterators?

It's a good idea to be familiar with common data structures. You have a tree, and there are several ways to traverse a tree. You haven't precisely specified which method to use, so I chose one arbitrarily that's easy to implement.

The key here is to implement an iterator that keeps track of some state: all of the nodes yet to be visited. On each call to Iterator::next, we take the next value, save aside any new nodes to visit, and return the value.

Once you have the iterator, you can collect it into a Vec.

use std::collections::VecDeque;

impl IntoIterator for A {
    type IntoIter = IntoIter;
    type Item = String;

    fn into_iter(self) -> Self::IntoIter {
        IntoIter {
            remaining: self.vb.into_iter().flatten().collect(),
        }
    }
}

struct IntoIter {
    remaining: VecDeque<B>,
}

impl Iterator for IntoIter {
    type Item = String;

    fn next(&mut self) -> Option<Self::Item> {
        self.remaining.pop_front().and_then(|b| {
            b.c.map(|C { name, vb }| {
                self.remaining.extend(vb.into_iter().flatten());

                name
            })
        })
    }
}

fn to_strings(a: A) -> Vec<String> {
    a.into_iter().collect()
}

#[derive(Debug, Clone)]
struct A {
    vb: Option<Vec<B>>,
}

#[derive(Debug, Clone)]
struct B {
    c: Option<C>,
}

#[derive(Debug, Clone)]
struct C {
    name: String,
    vb: Option<Vec<B>>,
}

fn main() {
    let example: A = A {
        vb: Some(vec![
            B {
                c: Some(C {
                    name: "Hello ".to_string(),
                    vb: None,
                }),
            },
            B {
                c: Some(C {
                    name: "World!".to_string(),
                    vb: None,
                }),
            },
        ]),
    };
    println!("The example struct: {:?}", example);
    //clone a copy for a second example, because to_strings() takes ownership of the example A struct
    let receipt: A = example.clone();
    println!("Iterated: {:?}", to_strings(example));
    // another example of using to_strings()
    println!(
        "As a string: {:?}",
        to_strings(receipt).into_iter().collect::<String>()
    );
}

From here, it should be straight-forward to create an iterator of B if that's what you need. Having all of the None values seemed silly, so I left them out and directly returned Strings.

I also made this a by-value iterator. You could follow the same pattern to create an iterator that returned references to the B / String and only clone them as needed.

Flattening Iterator