Create an Array With a Pre Determined Mean and Standard Deviation

Python Numpy Standard deviation and mean

See https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html

in short, you can use

numpy.random.normal([mean], [standard deviation], [array size])

so for your example:

numpy.random.normal(75, 12, [array size])

Combined mean and standard deviation from a collection of NumPy arrays of different shapes

We could use the formula of standard deviation and mean to compute those two scalar values for all input arrays without concatenating/stacking (that could be costly specially on large NumPy arrays). Let's do it in steps - mean and then standard deviation, as it seems we could use mean in std computations.

Getting the combined mean value :

So, we will start with the mean/averaging. For this, we would get the summation scalar for each array. Then, get the total summation and finally divide by the number of elements in all arrays.

Getting the combined standard deviation value :

For standard deviation, we have the formula as :

enter image description here

So, we will use the combined mean value obtained from previous step, use the std formula to get the squared differentiation, divide by the total number of elements across all arrays and then apply square root.

Implementation

Let's say the input arrays are a and b, we would have one solution, like so -

N = float(a.size + b.size)
mean_ = (a.sum() + b.sum())/N
std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)

Sample run for verification

In [266]: a = np.random.rand(3,4,2)
...: b = np.random.rand(2,5,3)
...:

In [267]: N = float(a.size + b.size)
...: mean_ = (a.sum() + b.sum())/N
...: std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
...:

In [268]: mean_
Out[268]: 0.47854757879348042

In [270]: std_
Out[270]: 0.27890341338373376

Now, to verify, let's stack and then use relevant ufuncs -

In [271]: A = np.hstack((a.ravel(), b.ravel()))

In [273]: A.mean()
Out[273]: 0.47854757879348037

In [274]: A.std()
Out[274]: 0.27890341338373376

List of arrays as input

For a list holding all those arrays, we need to iterate through them, like so -

A = [a,b,c] # input list of arrays

N = float(sum([i.size for i in A]))
mean_ = sum([i.sum() for i in A])/N
std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)

Sample run -

In [301]: a = np.random.rand(3,4,2)
...: b = np.random.rand(2,5,3)
...: c = np.random.rand(7,4)
...:

In [302]: A = [a,b,c] # input list of arrays
...: N = float(sum([i.size for i in A]))
...: mean_ = sum([i.sum() for i in A])/N
...: std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
...: print mean_, std_
...:
0.47703535428 0.293308550786

In [303]: A = np.hstack((a.ravel(), b.ravel(), c.ravel()))
...: print A.mean(), A.std()
...:
0.47703535428 0.293308550786

Calculate the 3rd standard deviation for an array

NumPy's std yields the standard deviation, which is usually denoted with "sigma". To get the 2-sigma or 3-sigma ranges, you can simply multiply sigma with 2 or 3:

print [x.mean() - 3 * x.std(), x.mean() + 3 * x.std()]

Output:

[-27.545797458510656, 52.315028227741429]

For more detailed information, you might refer to the documentation, which states:

The standard deviation is the square root of the average of the
squared deviations from the mean, i.e., std = sqrt(mean(abs(x -
x.mean())**2)).

http://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html

Numpy array with different standard deviation per row

np.random.normal() is vectorized; you can switch axes and transpose the result:

np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T

print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]

That is, the scale parameter is the column-wise standard deviation, hence the need to transpose via .T since you want row-wise inputs.



Related Topics



Leave a reply



Submit