Numpy Random Choice to Produce a 2D-Array with All Unique Values

Numpy random choice to produce a 2D-array with all unique values

One trick I have used often is generating a random array and using argsort to get unique indices as the required unique numbers. Thus, we could do -

def random_choice_noreplace(m,n, axis=-1):
    # m, n are the number of rows, cols of output
    return np.random.rand(m,n).argsort(axis=axis)

Sample runs -

In [98]: random_choice_noreplace(3,7)
Out[98]: 
array([[0, 4, 3, 2, 6, 5, 1],
       [5, 1, 4, 6, 0, 2, 3],
       [6, 1, 0, 4, 5, 3, 2]])

In [99]: random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out[99]: 
array([[0, 2, 4, 4, 1, 0, 2],
       [1, 4, 3, 2, 4, 1, 3],
       [3, 1, 1, 3, 2, 3, 0],
       [2, 3, 0, 0, 0, 2, 4],
       [4, 0, 2, 1, 3, 4, 1]])

Runtime test -

# Original approach
def loopy_app(m,n):
    a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
    return np.vstack(a)

Timings -

In [108]: %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop

In [109]: %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop

How to create 2d array with numpy random.choice for every rows?

Here is a constructive approach, draw first (50 choices), second (49 choices) etc. For large sets it's quite competitive (pp in table):

# n = 10
# pp                    0.18564210 ms
# Divakar               0.01960790 ms
# James                 0.20074140 ms
# CK                    0.17823420 ms
# n = 1000
# pp                    0.80046050 ms
# Divakar               1.31817130 ms
# James                18.93511460 ms
# CK                   20.83670820 ms
# n = 1000000
# pp                  655.32905590 ms
# Divakar            1352.44713990 ms
# James             18471.08987370 ms
# CK                18369.79808050 ms
# pp     checking plausibility...
#     var (exp obs) 208.333333333 208.363840259
#     mean (exp obs) 25.5 25.5064865
# Divakar     checking plausibility...
#     var (exp obs) 208.333333333 208.21113972
#     mean (exp obs) 25.5 25.499471
# James     checking plausibility...
#     var (exp obs) 208.333333333 208.313436938
#     mean (exp obs) 25.5 25.4979035
# CK     checking plausibility...
#     var (exp obs) 208.333333333 208.169585249
#     mean (exp obs) 25.5 25.49

Code including benchmarking. Algo is a bit complicated because mapping to free spots is hairy:

import numpy as np
import types
from timeit import timeit

def f_pp(n):
    draw = np.empty((n, 6), dtype=int)
    # generating random numbers is expensive, so draw a large one and
    # make six out of one
    draw[:, 0] = np.random.randint(0, 50*49*48*47*46*45, (n,))
    draw[:, 1:] = np.arange(50, 45, -1)
    draw = np.floor_divide.accumulate(draw, axis=-1)
    draw[:, :-1] -= draw[:, 1:] * np.arange(50, 45, -1)
    # map the shorter ranges (:49, :48, :47) to the non-occupied
    # positions; this amounts to incrementing for each number on the
    # left that is not larger. the nasty bit: if due to incrementing
    # new numbers on the left are "overtaken" then for them we also
    # need to increment.
    for i in range(1, 6):
        coll = np.sum(draw[:, :i] <= draw[:, i, None], axis=-1)
        collidx = np.flatnonzero(coll)
        if collidx.size == 0:
            continue
        coll = coll[collidx]
        tot = coll
        while True:
            draw[collidx, i] += coll
            coll = np.sum(draw[collidx, :i] <= draw[collidx, i, None], axis=-1)
            relidx = np.flatnonzero(coll > tot)
            if relidx.size == 0:
                break
            coll, tot = coll[relidx]-tot[relidx], coll[relidx]
            collidx = collidx[relidx]

    return draw + 1

def check_result(draw, name):
    print(name[2:], '    checking plausibility...')
    import scipy.stats
    assert all(len(set(row)) == 6 for row in draw)
    assert len(set(draw.ravel())) == 50
    print('    var (exp obs)', scipy.stats.uniform(0.5, 50).var(), draw.var())
    print('    mean (exp obs)', scipy.stats.uniform(0.5, 50).mean(), draw.mean())

def f_Divakar(n):
    return np.random.rand(n, 50).argpartition(6,axis=1)[:,:6]+1

def f_James(n):
    return np.stack([np.random.choice(np.arange(1,51),size=6,replace=False) for i in range(n)])

def f_CK(n):
    return np.array([np.random.choice(np.arange(1, 51), size=6, replace=False) for _ in range(n)])

for n in (10, 1_000, 1_000_000):
    print(f'n = {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(n)', globals={'f':func, 'n':n}, number=10)*100))
        except:
            print("{:16s} apparently failed".format(name[2:]))
    if(n >= 10000):
        for name, func in list(globals().items()):
            if name.startswith('f_') and isinstance(func, types.FunctionType):

                check_result(func(n), name)

Numpy: Get random set of rows from 2D array

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

generate a 2D array of numpy.random.choice without replacement

I don't know how np.random.choice is implemented but I am guessing it is optimized for a general case. Your numbers, on the other hand, are not likely to produce the same sequences. Sets may be more efficient for this case, building from scratch:

import random

def gen_2d(iterations, Chr_size, num_mut):
    randbps = set()
    while len(randbps) < iterations:
        listed = set()
        while len(listed) < num_mut:
            listed.add(random.choice(range(Chr_size)))
        randbps.add(tuple(sorted(listed)))
    return np.array(list(randbps))

This function starts with an empty set, generates a single number in range(Chr_size) and adds that number to the set. Because of the properties of the sets it cannot add the same number again. It does the same thing for the randbps as well so each element of randbps is also unique.

For only one iteration of np.random.choice vs gen_2d:

iterations=1000
Chr_size=1000000
num_mut=500

%timeit np.random.choice(range(Chr_size),num_mut,replace=False)
10 loops, best of 3: 141 ms per loop

%timeit gen_2d(1, Chr_size, num_mut)
1000 loops, best of 3: 647 µs per loop

How to generate a 2D array of fixed size containing Random Unique Numbers (with Random.Sample)

Base Python solution:

from random import shuffle

n = 5
#create all combinations of index values
l = [[i, j]  for i in range(n) for j in range(n)]
#shuffle the list
shuffle(l)
#subdivide into chunks
res = [l[i:i+n] for i in range(0, n**2, n)]
print(res)

Sample output:

[[[0, 2], [3, 3], [4, 3], [4, 0], [3, 4]], [[1, 0], [0, 4], [0, 1], [4, 4], [2, 1]], [[0, 0], [2, 3], [2, 2], [2, 0], [1, 1]], [[1, 2], [3, 0], [4, 1], [3, 1], [2, 4]], [[0, 3], [1, 4], [1, 3], [3, 2], [4, 2]]]

How to generate a numpy array with random values that are all different from each other

Not sure if this will be ok for all your needs, but it will work for your example:

np.random.choice(np.arange(100, dtype=np.int32), size=(5, 5), replace=False)

random.choice for 2D array, bigger numbers with higher probability?

You can use apply_along_axis:

q = np.random.random((10,10))

def choice(row, n, replace=False):
    return np.random.choice(row, size=n, p=row/row.sum(), replace=replace)

np.apply_along_axis(func1d=choice, axis=1, arr=q, n=2)

I don't know what array do you have, but you should probably check that row.sum() is not 0 to avoid errors in computation of p=row/row.sum().

How to keep a fixed size of unique values in random positions in an array while replacing others with a mask?

My strategy is

Create a new array initialized to all zeros
Find the elements in each class
For each class
- Randomly sample two of elements to keep
- Set those elements of the new array to the class value

The trick is keeping the shape of the indexes appropriate so you retain the shape of the original array.

import numpy as  np
test_array = np.array([[0,0,0,0,0],
                      [1,1,1,1,1],
                      [0,0,0,0,0],
                      [2,2,2,4,4],
                      [4,4,4,2,2],
                      [0,0,0,0,0]])

def sample_classes(arr, n_keep=2, random_state=42):
    classes, counts = np.unique(test_array, return_counts=True)
    rng = np.random.default_rng(random_state)
    out = np.zeros_like(arr)
    for klass, count in zip(classes, counts):
        # Find locations of the class elements
        indexes = np.nonzero(arr == klass)
        # Sample up to n_keep elements of the class
        keep_idx = rng.choice(count, n_keep, replace=False)
        # Select the kept elements and reformat for indexing the output array and retaining its shape
        keep_idx_reshape = tuple(ind[keep_idx] for ind in indexes)
        out[keep_idx_reshape] = klass
    return out

You can use it like

In [3]: sample_classes(test_array)                                                                                                                                                                         [3/1174]
Out[3]:
array([[0, 0, 0, 0, 0],
       [0, 1, 1, 0, 0],
       [0, 0, 0, 0, 0],
       [2, 0, 0, 4, 0],
       [4, 0, 0, 2, 0],
       [0, 0, 0, 0, 0]])

In [4]: sample_classes(test_array, n_keep=3)
Out[4]:
array([[0, 0, 0, 0, 0],
       [1, 0, 1, 1, 0],
       [0, 0, 0, 0, 0],
       [0, 2, 0, 4, 0],
       [4, 4, 0, 2, 2],
       [0, 0, 0, 0, 0]])

In [5]: sample_classes(test_array, random_state=88)
Out[5]:
array([[0, 0, 0, 0, 0],
       [0, 0, 1, 1, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [4, 0, 4, 2, 2],
       [0, 0, 0, 0, 0]])

In [6]: sample_classes(test_array, random_state=88, n_keep=4)
Out[6]:
array([[0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [2, 2, 0, 4, 4],
       [4, 4, 0, 2, 2],
       [0, 0, 0, 0, 0]])

Numpy Random Choice to Produce a 2D-Array with All Unique Values