Better Way to Shuffle Two Numpy Arrays in Unison

Better way to shuffle two numpy arrays in unison

Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.

If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.

Example: Let's assume the arrays a and b look like this:

a = numpy.array([[[  0.,   1.,   2.],
[ 3., 4., 5.]],

[[ 6., 7., 8.],
[ 9., 10., 11.]],

[[ 12., 13., 14.],
[ 15., 16., 17.]]])

b = numpy.array([[ 0., 1.],
[ 2., 3.],
[ 4., 5.]])

We can now construct a single array containing all the data:

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[ 0., 1., 2., 3., 4., 5., 0., 1.],
# [ 6., 7., 8., 9., 10., 11., 2., 3.],
# [ 12., 13., 14., 15., 16., 17., 4., 5.]])

Now we create views simulating the original a and b:

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).

In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.

This solution could be adapted to the case that a and b have different dtypes.

How to shuffle two numpy arrays, so that record indices are stay aligned in both after shuffling?

This shuffles both arrays together:

import numpy as np

data = np.random.randn(10, 1, 5, 5) # num_records, depth, height, width
labels = np.array([1,1,1,1,1,0,0,0,0,0])

# shuffle indices
idx = np.random.permutation(range(len(labels)))

# shuffle together
data, labels = data[idx,:,:,:], labels[idx]

Numpy: shuffle arrays in unison multiple times with different seeds

I don't know what are you doing wrong with the way you set the state. However I found an alternative solution: instead of shuffling n arrays, shuffle their indeces only once with numpy.random.choice and then reorder all the arrays.

a = np.array([1,2,3,4,5])
b = np.array([10,20,30,40,5])

def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.choice(n_elem, size=n_elem, replace=False)
return a[indeces], b[indeces]

for i in xrange(5):
a, b = shuffle_in_unison(a ,b)
print(a, b)

I get:

[5 2 4 3 1] [50 20 40 30 10]
[1 3 4 2 5] [10 30 40 20 50]
[1 2 5 4 3] [10 20 50 40 30]
[3 2 1 4 5] [30 20 10 40 50]
[1 2 5 3 4] [10 20 50 30 40]

edit

Thanks to @Divakar for the suggestion.
Here is a more readable way to obtain the same result using numpy.random.premutation

def shuffle_in_unison(a, b):
n_elem = a.shape[0]
indeces = np.random.permutation(n_elem)
return a[indeces], b[indeces]

Randomize 2 numpy arrays the same way

Using Numpy

You can use np.c_ to shuffle them together and then put them back into your separate arrays -

import numpy as np

#Creating same X and y for demonstration
X = np.arange(0,10).reshape(5,2)
y = np.arange(0,10).reshape(5,2)

c = np.c_[X,y]
np.random.shuffle(c)

X1, y1 = c[:,:X.shape[1]], c[:,:y.shape[1]]

print(X1)
print(y1)
# Note, same order remains

[[8 9]
[0 1]
[4 5]
[6 7]
[2 3]]

[[8 9]
[0 1]
[4 5]
[6 7]
[2 3]]


Using Sklearn

A better option would be to use sklearn api -

from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

Shuffling numpy arrays keeping the respective values

Since both arrays are of same size, you can use Numpy Array Indexing.

def unison_shuffled_copies(a, b):
assert len(a) == len(b) # don't need if we know array a and b is same length
p = numpy.random.permutation(len(a)) # generate the shuffled indices
return a[p], b[p] # make the shuffled copies using same arrangement according to p

This is referencing this answer, with some changes.

Shuffling two numpy arrays for a NN

You can have an array of indexes with same shape as the respective arrays and each time shuffle the index array. In that case you can use the shuffled indexes to realign both arrays in a same way.

In [122]: indices = np.indices((2, 2))

In [125]: np.random.shuffle(indices)

In [126]: indices
Out[126]:
array([[[0, 0],
[1, 1]],

[[0, 1],
[0, 1]]])

In [127]: x[indices[0], indices[1]]
Out[127]:
array([[ 2., 3.],
[16., 4.]])

In [128]: y[indices[0], indices[1]]
Out[128]:
array([[1., 0.],
[0., 1.]])

Shuffle two list at once with same order

You can do it as:

import random

a = ['a', 'b', 'c']
b = [1, 2, 3]

c = list(zip(a, b))

random.shuffle(c)

a, b = zip(*c)

print a
print b

[OUTPUT]
['a', 'c', 'b']
[1, 3, 2]

Of course, this was an example with simpler lists, but the adaptation will be the same for your case.

Shuffle a numpy array with the largest element at the beginning in Python

For clarity, here is how you should proceed:

import numpy as np

r1 = np.array([[150. , 132.5001244 , 115.00024881],
[ 97.50037321, 80.00049761, 62.50062201],
[ 45.00074642, 27.50087082, 10.00099522]])

# make a copy and shuffle the copy
r2 = r1.copy()
np.random.shuffle(r2.ravel())

# get the index of the max
idx = np.unravel_index(r2.argmax(), r2.shape)

# swap first and max
r2[idx], r2[(0, 0)] = r2[(0, 0)], r2[idx]

print(r2)

Alternative

If the the array is initially sorted, we can use a flat view of the array:

r1 = np.array([[150.        , 132.5001244 , 115.00024881],
[ 97.50037321, 80.00049761, 62.50062201],
[ 45.00074642, 27.50087082, 10.00099522]])

r2 = r1.copy()

# r2.ravel() returns a view of the original array
# so we can shuffle only the items starting from 1
np.random.shuffle(r2.ravel()[1:])

possible output:

[[150.          80.00049761  62.50062201]
[ 97.50037321 132.5001244 115.00024881]
[ 45.00074642 27.50087082 10.00099522]]


Related Topics



Leave a reply



Submit