Consistently Create Same Random Numpy Array

Consistently create same random numpy array

Simply seed the random number generator with a fixed value, e.g.

numpy.random.seed(42)

This way, you'll always get the same random number sequence.

This function will seed the global default random number generator, and any call to a function in numpy.random will use and alter its state. This is fine for many simple use cases, but it's a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern's answer below.

How to generate same random arrays while using np.random.normal()

After some digging, I found a way

import numpy as np
import matplotlib.pyplot as plt
import random
rng = np.random.RandomState(0)
X_p=rng.normal(0,0.05,size=(100,2))
X_n=rng.normal(0.13,0.02,size=(50,2))
plt.scatter(X_p[:,0],X_p[:,1])
plt.scatter(X_n[:,0],X_n[:,1],color='red')
plt.show()

Sample Image

Now each and every time you run this code, you'll get the same output

Does numpy.random.seed() always give the same random number every time?

The np.random documentation describes the PRNGs used. Apparently, there was a partial switch from MT19937 to PCG64 in the recent past. If you want consistency, you'll need to:

  1. fix the PRNG used, and
  2. ensure that you're using a local handle (e.g. RandomState, Generator) so that any changes to other external libraries don't mess things up by calling np.random globals themselves.

In this example, we make use of the newer BitGenerator API, which provides a selection of various PRNGs.

from numpy.random import Generator, PCG64

rg = Generator(PCG64(1234))

Which may be used as follows:

>>> rg.uniform(0, 10, 10)
array([9.767, 3.802, 9.232, 2.617, 3.191, 1.181, 2.418, 3.185, 9.641,
2.636])

If we re-run this any number of times (even within the same REPL!), we will always obtain the same random number generator. PCG64, like MT19937, provides the following guarantee:

Compatibility Guarantee

PCG64 makes a guarantee that a fixed seed and will always produce the same random integer stream.

Though, as @user2357112 supports Monica noted, changes to the random API functions that use the random integer sequence (e.g. np.random.Generator.uniform) are still technically possible, though unlikely.

In order to generate multiple generators, one can make use of SeedSequence.spawn(k) to generate k different SeedSequences. This is useful for consistent concurrent computations:

from numpy.random import Generator, PCG64, SeedSequence

sg = SeedSequence(1234)
rgs = [Generator(PCG64(s)) for s in sg.spawn(10)]

Create random numpy matrix of same size as another.

np.random.randn takes the shape of the array as its input which you can get directly from the shape property of the first array. You have to unpack a.shape with the * operator in order to get the proper input for np.random.randn.

a = np.zeros([2, 3])
print(a.shape)
# outputs: (2, 3)
b = np.random.randn(*a.shape)
print(b.shape)
# outputs: (2, 3)

Why does numpy return different random numbers from the same random state?

So to summarize the question: the first 5 numbers of a part of the random state are sometimes the same, but the output of the random generator is different.

The short answer is: the random state does change, but the first 5 numbers you are looking at remain the same. The change is in the number at index 2:

for i in range(3):
randomState = np.random.get_state()
state = np.random.get_state()[2]
randints = np.random.randint(-10, 10, size = 5)
df = pd.DataFrame.from_dict({'state':state, 'randints':randints})
print(df)

Output:

   randints  state
0 -9 624
1 6 624
2 4 624
3 -5 624
4 5 624
randints state
0 -9 5
1 -5 5
2 4 5
3 -4 5
4 -4 5
randints state
0 5 10
1 -8 10
2 8 10
3 -10 10
4 -3 10

Numpy uses the Mersenne Twister algorithm, which generates 32-bits random numbers, in groups of 624 at a time. So we might expect the big state array to remain the same until all these numbers have been consumed and the Twister needs to be called again.

At index 2 of the state, it stores how many of these numbers have already been consumed. This starts out at 624, so the Twister is run once at the start, before generating any output. After that, you'll see the list remain the same until all 624 numbers have been consumed. Then the Twister is called again, the counter is reset to 0, and the entire thing starts over.

Making numpy random draws consistent for reproducability

np.random.seed is a function. Replace:

np.random.seed = 198908

With:

np.random.seed(198908)

Details

The argument provided to seed can be (1) any integer or (2) an array (or other sequence) of integers of any length, or (3) None. If it is None, then numpy will select a seed from the best available random source which on Linux would be /dev/urandom.



Related Topics



Leave a reply



Submit