Differences Between Numpy.Random and Random.Random in Python

Differences between numpy.random and random.random in Python

You have made many correct observations already!

Unless you'd like to seed both of the random generators, it's probably simpler in the long run to choose one generator or the other. But if you do need to use both, then yes, you'll also need to seed them both, because they generate random numbers independently of each other.

For numpy.random.seed(), the main difficulty is that it is not thread-safe - that is, it's not safe to use if you have many different threads of execution, because it's not guaranteed to work if two different threads are executing the function at the same time. If you're not using threads, and if you can reasonably expect that you won't need to rewrite your program this way in the future, numpy.random.seed() should be fine. If there's any reason to suspect that you may need threads in the future, it's much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven't found any evidence to the contrary).

The numpy.random library contains a few extra probability distributions commonly used in scientific research, as well as a couple of convenience functions for generating arrays of random data. The random.random library is a little more lightweight, and should be fine if you're not doing scientific research or other kinds of work in statistics.

Otherwise, they both use the Mersenne twister sequence to generate their random numbers, and they're both completely deterministic - that is, if you know a few key bits of information, it's possible to predict with absolute certainty what number will come next. For this reason, neither numpy.random nor random.random is suitable for any serious cryptographic uses. But because the sequence is so very very long, both are fine for generating random numbers in cases where you aren't worried about people trying to reverse-engineer your data. This is also the reason for the necessity to seed the random value - if you start in the same place each time, you'll always get the same sequence of random numbers!

As a side note, if you do need cryptographic level randomness, you should use the secrets module, or something like Crypto.Random if you're using a Python version earlier than Python 3.6.

Performance difference between numpy.random and random.random in Python

numpy.random and python random work in different ways, although, as you say, they use the same algorithm.

In terms of seed: You can use the set_state and get_state functions from numpy.random (in python random called getstate and setstate) and pass the state from one to another. The structure is slightly different (in python the pos integer is attached to the last element in the state tuple). See the docs for numpy.random.get_state() and random.getstate():

import random
import numpy as np
random.seed(10)
s1 = list(np.random.get_state())
s2 = list(random.getstate())

s1[1] = np.array(s2[1][:-1]).astype('int32')
s1[2] = s2[1][-1]

np.random.set_state(tuple(s1))

print(np.random.random())
print(random.random())
>> 0.5714025946899135
0.5714025946899135

In terms of efficiency: it depends on what you want to do, but numpy is usually better because you can create arrays of elements without the need of a loop:

%timeit np.random.random(10000)
142 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit [random.random() for i in range(10000)]
1.48 ms ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In terms of "randomness", numpy is (according to their docs), also better:

Notes: The Python stdlib module "random" also contains a Mersenne Twister pseudo-random number generator with a number of methods that
are similar to the ones available in RandomState. RandomState,
besides being NumPy-aware, has the advantage that it provides a much
larger number of probability distributions to choose from.

What is the difference between numpy.random's Generator class and np.random methods?

numpy.random.* functions (including numpy.random.binomial) make use of a global pseudorandom number generator (PRNG) object which is shared across the application. On the other hand, default_rng() is a self-contained Generator object that doesn't rely on global state.

If you don't care about reproducible "randomness" in your application, these two approaches are equivalent for the time being. Although NumPy's new RNG policy discourages the use of global state in general, it did not deprecate any numpy.random.* functions in version 1.17, although a future version of NumPy might.

Note also that because numpy.random.* functions rely on a global PRNG object that isn't thread-safe, these functions can cause race conditions if your application uses multiple threads. (Generator objects are not thread-safe, either, but there are ways to generate pseudorandom numbers via multithreading, without the need to share PRNG objects across threads.)

Differences between numpy.random.rand vs numpy.random.randn in Python

First, as you see from the documentation numpy.random.randn generates samples from the normal distribution, while numpy.random.rand from a uniform distribution (in the range [0,1)).

Second, why did the uniform distribution not work? The main reason is the activation function, especially in your case where you use the sigmoid function. The plot of the sigmoid looks like the following:

Sample Image

So you can see that if your input is away from 0, the slope of the function decreases quite fast and as a result you get a tiny gradient and tiny weight update. And if you have many layers - those gradients get multiplied many times in the back pass, so even "proper" gradients after multiplications become small and stop making any influence. So if you have a lot of weights which bring your input to those regions you network is hardly trainable. That's why it is a usual practice to initialize network variables around zero value. This is done to ensure that you get reasonable gradients (close to 1) to train your net.

However, uniform distribution is not something completely undesirable, you just need to make the range smaller and closer to zero. As one of good practices is using Xavier initialization. In this approach you can initialize your weights with:

  1. Normal distribution. Where mean is 0 and var = sqrt(2. / (in + out)), where in - is the number of inputs to the neurons and out - number of outputs.

  2. Uniform distribution in range [-sqrt(6. / (in + out)), +sqrt(6. / (in + out))]

np.random.rand vs np.random.random

First note that numpy.random.random is actually an alias for numpy.random.random_sample. I'll use the latter in the following. (See this question and answer for more aliases.)

Both functions generate samples from the uniform distribution on [0, 1). The only difference is in how the arguments are handled. With numpy.random.rand, the length of each dimension of the output array is a separate argument. With numpy.random.random_sample, the shape argument is a single tuple.

For example, to create an array of samples with shape (3, 5), you can write

sample = np.random.rand(3, 5)

or

sample = np.random.random_sample((3, 5))

(Really, that's it.)


Update

As of version 1.17, NumPy has a new random API. The recommended method for generating samples from the uniform distribution on [0, 1) is:

>>> rng = np.random.default_rng()  # Create a default Generator.
>>> rng.random(size=10) # Generate 10 samples.
array([0.00416913, 0.31533329, 0.19057857, 0.48732511, 0.40638395,
0.32165646, 0.02597142, 0.19788567, 0.08142055, 0.15755424])

The new Generator class does not have the rand() or random_sample() methods. There is a uniform() method that allows you to specify the lower and upper bounds of the distribution. E.g.

>>> rng.uniform(1, 2, size=10)
array([1.75573298, 1.79862591, 1.53700962, 1.29183769, 1.16439681,
1.64413869, 1.7675135 , 1.02121057, 1.37345967, 1.73589452])

The old functions in the numpy.random namespace will continue to work, but they are considered "frozen", with no ongoing development. If you are writing new code, and you don't have to support pre-1.17 versions of numpy, it is recommended that you use the new random API.

Difference between Numpy Randn and RandomState

np.random uses a Pseudorandom number generator (also called PRNG) to generate a sequence of numbers which look random. Basically it has an internal "seed" number that it applies some function to which generates the next number in the sequence. This function then updates the internal seed so the next number in the sequence will likely be different.

np.random.RandomState(2) creates a new PRNG with its internal seed set to 2. This generator will produce numbers from a fixed sequence, which is why every time you call np.random.RandomState(2).randn(2), you get the same 2 numbers. If you instead saved the RandomState object and continually called randn(2) on it, you'd get the same sequence of numbers as another RandomState(2).

>>> rs1 = np.random.RandomState(2)
>>> rs2 = np.random.RandomState(2)
>>> rs1.randn(2), rs2.randn(2)
(array([-0.41675785, -0.05626683]), array([-0.41675785, -0.05626683]))
>>> rs1.randn(2), rs2.randn(2)
(array([-2.1361961 , 1.64027081]), array([-2.1361961 , 1.64027081]))

np.random.seed(2) will set the seed to a global instance of this PRNG to 2. Normally its seeded with something like the timestamp from when the process started so you get new random numbers every time you run a program. Setting this seed will make it so you get a deterministic sequence of random numbers when calling things like np.random.randn(2), which uses the global PRNG.

>>> np.random.seed(2)
>>> np.random.randn(2)
array([-0.41675785, -0.05626683])
>>> np.random.randn(2)
array([-2.1361961 , 1.64027081])

Difference Between np.random.uniform() and uniform() using built-in python packages

Create one module, say, blankpaper.py, with only two lines of code

import numpy as np
np.random.seed(420)

Then, in your main script, execute

import numpy as np 
import blankpaper

print(np.random.uniform())

You should be getting exactly the same numbers.

When a module or library sets np.random.seed(some_number), it is global. Behind the numpy.random.* functions is an instance of the global RandomState generator, emphasis on the global.

It is very likely that something that you are importing is doing the aforementioned.

Change the main script to

import numpy as np
import blankpaper

rng = np.random.default_rng()
print(rng.uniform())

and you should be getting new numbers each time.

default_rng is a constructor for the random number class, Generator. As stated in the documentation,

This function does not manage a default global instance.

In reply to the question, "[a]re you setting a seed first?", you said

Yes, I'm using it but it doesn't matter if I don't use a seed or
change the seed number. I checked it several times.

Imagine we redefine blankpaper.py to contain the lines

import numpy as np

def foo():
np.random.seed(420)
print("I exist to always give you the same number.")

and suppose your main script is

import numpy as np
import blankpaper

np.random.seed(840)

blankpaper.foo()
print(np.random.uniform())

then you should be getting the same numbers as were obtained from executing the first main script (top of the answer).

In this case, the setting of the seed is hidden in one of the functions in the blankpaper module, but the same thing would happen if blankpaper.foo were a class and blankpaper.foo's __init__() method set the seed.

So this setting of the global seed can be quite "hidden".

Note also that the above also applies for the functions in the random module

The functions supplied by this module are actually bound methods of a
hidden instance of the random.Random class. You can instantiate your
own instances of Random to get generators that don’t share state.

So when uniform() from the random module was generating different numbers each time for you, it was very likely because you nor some other module set the seed shared by functions from the random module.

In both numpy and random, if your class or application wants to have it's own state, create an instance of Generator from numpy or Random from random (or SystemRandom for cryptographically-secure randomness). This will be something you can pass around within your application. It's methods will be the functions in the numpy.random or random module, only they will have their own state (unless you explicitly set them to be equal).

Finally, I am not claiming that this is exactly what is causing your problem (I had to make a few inferences since I cannot see your code), but this is a very likely reason.

Any questions/concerns please let me know!

What's the difference between numpy.random vs numpy.random.Generate

The Generator referred to in the documentation is a class, introduced in NumPy 1.17: it's the core class responsible for adapting values from an underlying bit generator to generate samples from various distributions. numpy.random.exponential is part of the (now) legacy Mersenne-Twister-based random framework. You probably shouldn't worry about the legacy functions being removed any time soon - doing so would break a huge amount of code, but the NumPy developers recommend that for new code, you should use the new system, not the legacy system.

Your best source for the rationale for the change to the system is probably NEP 19: https://numpy.org/neps/nep-0019-rng-policy.html

To use Generator.exponential as recommended by the documentation, you first need to create an instance of the Generator class. The easiest way to create such an instance is to use the numpy.random.default_rng() function.

So you want to start with something like:

>>> import numpy
>>> my_generator = numpy.random.default_rng()

At this point, my_generator is an instance of numpy.random.Generator:

>>> type(my_generator)
<class 'numpy.random._generator.Generator'>

and you can use my_generator.exponential to get variates from an exponential distribution. Here we take 10 samples from an exponential distribution with scale parameter 3.2 (or equivalently, rate 0.3125):

>>> my_generator.exponential(3.2, size=10)
array([6.26251663, 1.59879107, 1.69010179, 4.17572623, 5.94945358,
1.19466134, 3.93386506, 3.10576934, 1.26095418, 1.18096234])

Your Generator instance can of course also be used to get any other random variates you need:

>>> my_generator.integers(0, 100, size=3)
array([56, 57, 10])

np.random.rand() or random.random()

np.random.rand(len(df)) returns an array of uniform random numbers between 0 and 1, np.random.rand(len(df)) < 0.8 will transform it into an array of booleans based on the condition.

As there is a 80% chance to be below 0.8, there is 80% of True values.

A more explicit approach would be to use numpy.random.choice:

np.random.choice([True, False], p=[0.8, 0.2], size=len(df))

An even better approach, if your goal is to subset a dataframe, would be to use:

df.sample(frac=0.8)

how to split a dataframe 0.8/0.2:

df1 = df.sample(frac=0.8)
df2 = df.drop(df1.index)


Related Topics



Leave a reply



Submit