Creating Same Random Number Sequence in Python, Numpy and R

Consistently create same random numpy array

Simply seed the random number generator with a fixed value, e.g.

numpy.random.seed(42)

This way, you'll always get the same random number sequence.

This function will seed the global default random number generator, and any call to a function in numpy.random will use and alter its state. This is fine for many simple use cases, but it's a form of global state with all the problems global state brings. For a cleaner solution, see Robert Kern's answer below.

Why do the numpy and random modules give different random numbers for the same seed?

The random module and numpy.random both use a mt19937 to generate random numbers. Because of this, we can copy the state of one from one generator to the other to see if they have the same underlying implementation.

import random as rnd
import numpy as np

# seed numpy
np.random.seed(1)

# get state from numpy
state = [int(s) for s in list(np.random.get_state()[1])]
state.append(624)
state = tuple(state)
state = (3, tuple(state), None)

# set state for python
rnd.setstate(state)

print(rnd.random())
print(np.random.rand())

0.417022004702574

0.417022004702574

It looks like the mt19937 engine used gives equivalent results if the state is manually set to be the same. This seems to imply the seed function are implemented differently.

`numpy.random.normal` generates different numbers on different systems

Given that the differences are all so small, it suggests that the underlying bit-generators are doing the same things. It's just to do with differences between the underlying math library.

NumPy's legacy generator uses sqrt and log from libm, and you can see that it pulls in these symbols by first finding the shared object providing the generator via:

import numpy as np

print(np.random.mtrand.__file__)

then dumping symbols with:

nm -C -gD mtrand.*.so | grep GLIBC

where that mtrand filename comes from the above output.

I get a lot of other symbols output, but that might explain the differences.

At a guess it's to do with the log implementation, so you could test with:

import numpy as np

np.random.seed(0)

x = 2 * np.random.rand(2, 10**5) - 1
r2 = np.sum(x * x, axis=0)

np.save('test-log.npy', np.log(r2))

and compare between these two systems.

Comparing Matlab and Numpy code that uses random number generation

  1. One way to ensure the same numbers are fed to your process is to generate them in one of the two languges, save them and import into the other language. This is fairly easy, you could write them in a simple textfile.

  2. If this is not possible or desirable, you can also make sure the numbers are the same by doing the generation of the pseudo random numbers yourself. Here is a site that shows a very simple example of an easy to implement algorithm: Build your own simple random numbers

  3. If the quality of your homemade random generator is not sufficient, you can build a random generation function in one language, and call it from the other. The easiest path is probably to call matlab from python.

  4. If you are feeling lucky, try playing around with the settings. For example try using the (outdated) seed input to matlabs random functions. Or try using different kinds of generators. I believe the default in both languages is mersenne twister, but if this implementation is not the same, perhaps a simpler one is.



Related Topics



Leave a reply



Submit