How to Build a Numpy Array from a Generator

How do I build a numpy array from a generator?

Numpy arrays require their length to be set explicitly at creation time, unlike python lists. This is necessary so that space for each item can be consecutively allocated in memory. Consecutive allocation is the key feature of numpy arrays: this combined with native code implementation let operations on them execute much quicker than regular lists.

Keeping this in mind, it is technically impossible to take a generator object and turn it into an array unless you either:

can predict how many elements it will yield when run:

my_array = numpy.empty(predict_length())
for i, el in enumerate(gimme()): my_array[i] = el

are willing to store its elements in an intermediate list :
```
my_array = numpy.array(list(gimme()))
```
can make two identical generators, run through the first one to find the total length, initialize the array, and then run through the generator again to find each element:
```
length = sum(1 for el in gimme())
my_array = numpy.empty(length)
for i, el in enumerate(gimme()): my_array[i] = el
```

1 is probably what you're looking for. 2 is space inefficient, and 3 is time inefficient (you have to go through the generator twice).

how to read generator data as numpy array

Take a look at some of these other posts which seem to answer the basic question of "convert a generator to an array":

How do I build a numpy array from a generator?
How to construct an np.array with fromiter
How to fill a 2D Python numpy array with values from a generator?
numpy fromiter with generator of list

Without knowing exactly what your generator is returning, the best I can do is provide a somewhat generic (but not particularly efficient) example:

#!/usr/bin/env -p python

import numpy as np

# Sample generator of (x, y, z) tuples
def my_generator():
    for i in range(10):
        yield (i, i*2, i*2 + 1)
        i += 1

def gen_to_numpy(gen):
    return np.array([x for x in gen])

gen = my_generator()
array = gen_to_numpy(gen)

print(type(array))
print(array)

Output:

<class 'numpy.ndarray'>
[[ 0  0  1]
 [ 1  2  3]
 [ 2  4  5]
 [ 3  6  7]
 [ 4  8  9]
 [ 5 10 11]
 [ 6 12 13]
 [ 7 14 15]
 [ 8 16 17]
 [ 9 18 19]]

Again though, I cannot comment on the efficiency of this. You mentioned that it takes a long time to plot by reading points directly from the generator, but converting to a Numpy array will still require going through the whole generator to get the data. It would probably be much more efficient if the laser to pointcloud implementation you are using could provide the data directly as an array, but that is a question for the ROS Answers forum (I notice you already asked this there).

How can I make a generator which iterates over 2D numpy array?

consider array a

a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

Option 1
Use a generator

def get_every_n(a, n=2):
    for i in range(a.shape[0] // n):
        yield a[n*i:n*(i+1)]

for sa in get_every_n(a):
    print sa

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]

Option 2
use reshape and //

a.reshape(a.shape[0] // 2, -1, a.shape[1])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Option 3
if you wanted groups of two rather than two groups

a.reshape(-1, 2, a.shape[1])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Since you explicitly stated that you need a generator you can use option 1 as the appropriate reference.

How do I use a generator to initialize a numpy array?

You're looking for np.fromiter.

Here's a simpler example to demonstrate how it works:

>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> np.fromiter((i + j for (i, j) in zip(a, b)), np.float)
array([ 5.,  7.,  9.])

Note you have to supply the data type as the second argument, and that the generator expression must be parenthesized since it's not the sole argument.

When I tried this with your sample code, I got an error saying shapes are not aligned... I'm guessing it's an issue with the dot product.

Create numpy array from generator, list of lists

To use a compound dtype, the function has to return tuples, not lists

In [977]: def foo(bar): 
     ...:   return (bar,) * 3 # so for 4 it returns [4,4,4], .. 
     ...:  
     ...: a = [1,2,3,4,5,6,7] 
     ...: b = map(foo,a)                                                                               
In [978]: list(b)                                                                                      
Out[978]: [(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5), (6, 6, 6), (7, 7, 7)]
In [979]: def foo(bar): 
     ...:   return (bar,) * 3 # so for 4 it returns [4,4,4], .. 
     ...:  
     ...: a = [1,2,3,4,5,6,7] 
     ...: b = map(foo,a)                                                                               
In [980]: np.fromiter(b, 'i,i,i')                                                                      
Out[980]: 
array([(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5), (6, 6, 6),
       (7, 7, 7)], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])

some timings:

In [981]: %%timeit b = map(foo,a) 
     ...: np.array(list(b)) 
     ...:  
     ...:                                                                                              
1.9 µs ± 55.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [982]: %%timeit b = map(foo,a) 
     ...: np.fromiter(b, 'i,i,i') 
     ...:  
     ...:                                                                                              
17.2 µs ± 9.72 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Python generator with numpy array

That is because c is always pointing to the same numpy array reference, you are just changing the element inside c in the generator function.

When simply printing, it prints the complete c array at that particular moment , hence you correctly get the values printed.

But when you are using list(my_gen()) , you keep adding the same reference to c numpy array into the list, and hence any changes to that numpy array also reflect in the previously added elements in the list.

It works for you when you do yield c.tolist() , because that creates a new list from the numpy array, hence you keep adding new list objects to the list and hence changes in the future to c does not reflect in the previously added lists.