Consistently create same random numpy array
Simply seed the random number generator with a fixed value, e.g.
numpy.random.seed(42)
This way, you'll always get the same random number sequence.
This function will seed the global default random number generator, and any call to a function in numpy.random
will use and alter its state. This is fine for many simple use cases, but it's a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern's answer below.
How to generate same random arrays while using np.random.normal()
After some digging, I found a way
import numpy as np
import matplotlib.pyplot as plt
import random
rng = np.random.RandomState(0)
X_p=rng.normal(0,0.05,size=(100,2))
X_n=rng.normal(0.13,0.02,size=(50,2))
plt.scatter(X_p[:,0],X_p[:,1])
plt.scatter(X_n[:,0],X_n[:,1],color='red')
plt.show()
Now each and every time you run this code, you'll get the same output
Does numpy.random.seed() always give the same random number every time?
The np.random
documentation describes the PRNGs used. Apparently, there was a partial switch from MT19937
to PCG64
in the recent past. If you want consistency, you'll need to:
- fix the PRNG used, and
- ensure that you're using a local handle (e.g.
RandomState
,Generator
) so that any changes to other external libraries don't mess things up by callingnp.random
globals themselves.
In this example, we make use of the newer BitGenerator
API, which provides a selection of various PRNGs.
from numpy.random import Generator, PCG64
rg = Generator(PCG64(1234))
Which may be used as follows:
>>> rg.uniform(0, 10, 10)
array([9.767, 3.802, 9.232, 2.617, 3.191, 1.181, 2.418, 3.185, 9.641,
2.636])
If we re-run this any number of times (even within the same REPL!), we will always obtain the same random number generator. PCG64, like MT19937, provides the following guarantee:
Compatibility Guarantee
PCG64 makes a guarantee that a fixed seed and will always produce the same random integer stream.
Though, as @user2357112 supports Monica noted, changes to the random API functions that use the random integer sequence (e.g. np.random.Generator.uniform
) are still technically possible, though unlikely.
In order to generate multiple generators, one can make use of SeedSequence.spawn(k)
to generate k
different SeedSequence
s. This is useful for consistent concurrent computations:
from numpy.random import Generator, PCG64, SeedSequence
sg = SeedSequence(1234)
rgs = [Generator(PCG64(s)) for s in sg.spawn(10)]
Create random numpy matrix of same size as another.
np.random.randn
takes the shape of the array as its input which you can get directly from the shape
property of the first array. You have to unpack a.shape
with the *
operator in order to get the proper input for np.random.randn
.
a = np.zeros([2, 3])
print(a.shape)
# outputs: (2, 3)
b = np.random.randn(*a.shape)
print(b.shape)
# outputs: (2, 3)
Why does numpy return different random numbers from the same random state?
So to summarize the question: the first 5 numbers of a part of the random state are sometimes the same, but the output of the random generator is different.
The short answer is: the random state does change, but the first 5 numbers you are looking at remain the same. The change is in the number at index 2
:
for i in range(3):
randomState = np.random.get_state()
state = np.random.get_state()[2]
randints = np.random.randint(-10, 10, size = 5)
df = pd.DataFrame.from_dict({'state':state, 'randints':randints})
print(df)
Output:
randints state
0 -9 624
1 6 624
2 4 624
3 -5 624
4 5 624
randints state
0 -9 5
1 -5 5
2 4 5
3 -4 5
4 -4 5
randints state
0 5 10
1 -8 10
2 8 10
3 -10 10
4 -3 10
Numpy uses the Mersenne Twister algorithm, which generates 32-bits random numbers, in groups of 624 at a time. So we might expect the big state array to remain the same until all these numbers have been consumed and the Twister needs to be called again.
At index 2
of the state, it stores how many of these numbers have already been consumed. This starts out at 624, so the Twister is run once at the start, before generating any output. After that, you'll see the list remain the same until all 624 numbers have been consumed. Then the Twister is called again, the counter is reset to 0, and the entire thing starts over.
Making numpy random draws consistent for reproducability
np.random.seed
is a function. Replace:
np.random.seed = 198908
With:
np.random.seed(198908)
Details
The argument provided to seed
can be (1) any integer or (2) an array (or other sequence) of integers of any length, or (3) None. If it is None
, then numpy
will select a seed from the best available random source which on Linux would be /dev/urandom
.
Related Topics
How to Load a Module from Code in a String
How to Print One Character at a Time on One Line
Consistently Create Same Random Numpy Array
Valueerror: Could Not Convert String to Float: Id
C and Python - Different Behaviour of the Modulo (%) Operation
Difference Between .String and .Text Beautifulsoup
Plot with Custom Text for X Axis Points
How to Get an Event Callback When a Tkinter Entry Widget Is Modified
How to Convert a String Date into Datetime Format in Python
Find Length of Sequences of Identical Values in a Numpy Array (Run Length Encoding)
Understanding Popen.Communicate
Why Is '' > 0 True in Python 2
Monitoring Contents of Files/Directories
Converting String to Int Using Try/Except in Python
Launch a Completely Independent Process
How to Retrieve Items from a Dictionary in the Order That They'Re Inserted