Numpy: Get Random Set of Rows from 2D Array

Numpy: Get random set of rows from 2D array

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

how to randomly sample in 2D matrix in numpy

Just use a random index (in your case 2 because you have 3 dimensions):

import numpy as np

Space_Position = np.array(Space_Position)

random_index1 = np.random.randint(0, Space_Position.shape[0])
random_index2 = np.random.randint(0, Space_Position.shape[1])

Space_Position[random_index1, random_index2]  # get the random element.

The alternative is to actually make it 2D:

Space_Position = np.array(Space_Position).reshape(-1, 2)

and then use one random index:

Space_Position = np.array(Space_Position).reshape(-1, 2)      # make it 2D
random_index = np.random.randint(0, Space_Position.shape[0])  # generate a random index
Space_Position[random_index]                                  # get the random element.

If you want N samples with replacement:

N = 5

Space_Position = np.array(Space_Position).reshape(-1, 2)                # make it 2D
random_indices = np.random.randint(0, Space_Position.shape[0], size=N)  # generate N random indices
Space_Position[random_indices]  # get N samples with replacement

or without replacement:

Space_Position = np.array(Space_Position).reshape(-1, 2)  # make it 2D
random_indices = np.arange(0, Space_Position.shape[0])    # array of all indices
np.random.shuffle(random_indices)                         # shuffle the array
Space_Position[random_indices[:N]]  # get N samples without replacement

How to create 2d array with numpy random.choice for every rows?

Here is a constructive approach, draw first (50 choices), second (49 choices) etc. For large sets it's quite competitive (pp in table):

# n = 10
# pp                    0.18564210 ms
# Divakar               0.01960790 ms
# James                 0.20074140 ms
# CK                    0.17823420 ms
# n = 1000
# pp                    0.80046050 ms
# Divakar               1.31817130 ms
# James                18.93511460 ms
# CK                   20.83670820 ms
# n = 1000000
# pp                  655.32905590 ms
# Divakar            1352.44713990 ms
# James             18471.08987370 ms
# CK                18369.79808050 ms
# pp     checking plausibility...
#     var (exp obs) 208.333333333 208.363840259
#     mean (exp obs) 25.5 25.5064865
# Divakar     checking plausibility...
#     var (exp obs) 208.333333333 208.21113972
#     mean (exp obs) 25.5 25.499471
# James     checking plausibility...
#     var (exp obs) 208.333333333 208.313436938
#     mean (exp obs) 25.5 25.4979035
# CK     checking plausibility...
#     var (exp obs) 208.333333333 208.169585249
#     mean (exp obs) 25.5 25.49

Code including benchmarking. Algo is a bit complicated because mapping to free spots is hairy:

import numpy as np
import types
from timeit import timeit

def f_pp(n):
    draw = np.empty((n, 6), dtype=int)
    # generating random numbers is expensive, so draw a large one and
    # make six out of one
    draw[:, 0] = np.random.randint(0, 50*49*48*47*46*45, (n,))
    draw[:, 1:] = np.arange(50, 45, -1)
    draw = np.floor_divide.accumulate(draw, axis=-1)
    draw[:, :-1] -= draw[:, 1:] * np.arange(50, 45, -1)
    # map the shorter ranges (:49, :48, :47) to the non-occupied
    # positions; this amounts to incrementing for each number on the
    # left that is not larger. the nasty bit: if due to incrementing
    # new numbers on the left are "overtaken" then for them we also
    # need to increment.
    for i in range(1, 6):
        coll = np.sum(draw[:, :i] <= draw[:, i, None], axis=-1)
        collidx = np.flatnonzero(coll)
        if collidx.size == 0:
            continue
        coll = coll[collidx]
        tot = coll
        while True:
            draw[collidx, i] += coll
            coll = np.sum(draw[collidx, :i] <= draw[collidx, i, None], axis=-1)
            relidx = np.flatnonzero(coll > tot)
            if relidx.size == 0:
                break
            coll, tot = coll[relidx]-tot[relidx], coll[relidx]
            collidx = collidx[relidx]

    return draw + 1

def check_result(draw, name):
    print(name[2:], '    checking plausibility...')
    import scipy.stats
    assert all(len(set(row)) == 6 for row in draw)
    assert len(set(draw.ravel())) == 50
    print('    var (exp obs)', scipy.stats.uniform(0.5, 50).var(), draw.var())
    print('    mean (exp obs)', scipy.stats.uniform(0.5, 50).mean(), draw.mean())

def f_Divakar(n):
    return np.random.rand(n, 50).argpartition(6,axis=1)[:,:6]+1

def f_James(n):
    return np.stack([np.random.choice(np.arange(1,51),size=6,replace=False) for i in range(n)])

def f_CK(n):
    return np.array([np.random.choice(np.arange(1, 51), size=6, replace=False) for _ in range(n)])

for n in (10, 1_000, 1_000_000):
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(n)', globals={'f':func, 'n':n}, number=10)*100))
        except:
            print("{:16s} apparently failed".format(name[2:]))
    if(n >= 10000):
        for name, func in list(globals().items()):
            if name.startswith('f_') and isinstance(func, types.FunctionType):

                check_result(func(n), name)

Random sample from specific rows and columns of a 2d numpy array (essentially sampling by ignoring edge effects)

Woudl something like solve your problem?

import numpy as np

np.random.seed(0)
mat = np.random.random(size=(100,100))

x_indices = np.random.randint(low=10, high=90, size=250)
y_indices = np.random.randint(low=10, high=90, size=250)

coordinates = list(zip(x_indices,y_indices))

flat_mat = mat.flatten()
flat_index = x_indices * 100 + y_indices

Then you can access elements using any value from the coordinates list, e.g. mat[coordinates[0]] returns the the matrix value at coordinates[0]. Value of coordinates[0] is (38, 45) in my case. If the matrix is flattened, you can calculate the 1D index of the corresponding element. In this case, mat[coordinates[0]] == flat_mat[flat_index[0]] holds, where flat_index[0]==3845=100*38+45

Please also note that multiple sampling of the original data is possible this way.

Using your notation:

import numpy as np
np.random.seed(0)
gridsize = 100
new_abundances = np.zeros([100,100],dtype=np.uint8)
min_select = int(np.around(gridsize * 0.10))
max_select = int(gridsize - (np.around(gridsize * 0.10)))

x_indices = np.random.randint(low=min_select, high=max_select, size=250)
y_indices = np.random.randint(low=min_select, high=max_select, size=250)
coords = list(zip(x_indices,y_indices))

flat_new_abundances = new_abundances.flatten()
flat_index = x_indices * gridsize  + y_indices

Sampling rows in 2D numpy arrays with replacement

As per this issue, the feature was considered in 2014, but no substantial additions have been made to the API since then. There is, however, a better solution that cleverly makes use of numpy.random.choice and numpy's fancy indexing:

Starting with

In [102]: x = numpy.eye(3); x
Out[102]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

You may use numpy.random.choice to generate a list of random indices, like this:

In [103]: i = numpy.random.choice(3, 10); i
Out[103]: array([2, 2, 0, 2, 1, 1, 2, 0, 0, 1])

Then use i to index x:

In [104]: x[i]
Out[104]: 
array([[ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.]])

With a workaround this efficient, I don't believe a change to the API is necessary.

Do note that, for generating rows with a certain probability distribution, the procedure is the same. Specify a probability distribution on the indices itself.

Randomly selecting rows from numpy array

You can make any number of row-wise random partitions of A by slicing a shuffled sequence of row indices:

ind = numpy.arange( A.shape[ 0 ] )
numpy.random.shuffle( ind )
B = A[ ind[ :6 ], : ]
C = A[ ind[ 6: ], : ]

If you don't want to change the order of the rows in each subset, you can sort each slice of the indices:

B = A[ sorted( ind[ :6 ] ), : ]
C = A[ sorted( ind[ 6: ] ), : ]

(Note that the solution provided by @MaxNoe also preserves row order.)

How to generate a numpy array with random values that are all different from each other

Not sure if this will be ok for all your needs, but it will work for your example:

np.random.choice(np.arange(100, dtype=np.int32), size=(5, 5), replace=False)

Numpy: Get Random Set of Rows from 2D Array