Python Numpy Standard deviation and mean
See https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html
in short, you can use
numpy.random.normal([mean], [standard deviation], [array size])
so for your example:
numpy.random.normal(75, 12, [array size])
Combined mean and standard deviation from a collection of NumPy arrays of different shapes
We could use the formula of standard deviation
and mean
to compute those two scalar values for all input arrays without concatenating/stacking (that could be costly specially on large NumPy arrays). Let's do it in steps - mean and then standard deviation, as it seems we could use mean
in std
computations.
Getting the combined mean value :
So, we will start with the mean/averaging. For this, we would get the summation scalar for each array. Then, get the total summation and finally divide by the number of elements in all arrays.
Getting the combined standard deviation value :
For standard deviation, we have the formula as :
So, we will use the combined mean value obtained from previous step, use the std
formula to get the squared differentiation, divide by the total number of elements across all arrays and then apply square root.
Implementation
Let's say the input arrays are a
and b
, we would have one solution, like so -
N = float(a.size + b.size)
mean_ = (a.sum() + b.sum())/N
std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
Sample run for verification
In [266]: a = np.random.rand(3,4,2)
...: b = np.random.rand(2,5,3)
...:
In [267]: N = float(a.size + b.size)
...: mean_ = (a.sum() + b.sum())/N
...: std_ = np.sqrt((((a - mean_)**2).sum() + ((b - mean_)**2).sum())/N)
...:
In [268]: mean_
Out[268]: 0.47854757879348042
In [270]: std_
Out[270]: 0.27890341338373376
Now, to verify, let's stack and then use relevant ufuncs -
In [271]: A = np.hstack((a.ravel(), b.ravel()))
In [273]: A.mean()
Out[273]: 0.47854757879348037
In [274]: A.std()
Out[274]: 0.27890341338373376
List of arrays as input
For a list holding all those arrays, we need to iterate through them, like so -
A = [a,b,c] # input list of arrays
N = float(sum([i.size for i in A]))
mean_ = sum([i.sum() for i in A])/N
std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
Sample run -
In [301]: a = np.random.rand(3,4,2)
...: b = np.random.rand(2,5,3)
...: c = np.random.rand(7,4)
...:
In [302]: A = [a,b,c] # input list of arrays
...: N = float(sum([i.size for i in A]))
...: mean_ = sum([i.sum() for i in A])/N
...: std_ = np.sqrt(sum([((i-mean_)**2).sum() for i in A])/N)
...: print mean_, std_
...:
0.47703535428 0.293308550786
In [303]: A = np.hstack((a.ravel(), b.ravel(), c.ravel()))
...: print A.mean(), A.std()
...:
0.47703535428 0.293308550786
Calculate the 3rd standard deviation for an array
NumPy's std
yields the standard deviation, which is usually denoted with "sigma". To get the 2-sigma or 3-sigma ranges, you can simply multiply sigma with 2 or 3:
print [x.mean() - 3 * x.std(), x.mean() + 3 * x.std()]
Output:
[-27.545797458510656, 52.315028227741429]
For more detailed information, you might refer to the documentation, which states:
The standard deviation is the square root of the average of the
squared deviations from the mean, i.e., std = sqrt(mean(abs(x -
x.mean())**2)).
http://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html
Numpy array with different standard deviation per row
np.random.normal()
is vectorized; you can switch axes and transpose the result:
np.random.seed(444)
arr = np.random.normal(loc=0., scale=[1., 2., 3.], size=(1000, 3)).T
print(arr.mean(axis=1))
# [-0.06678394 -0.12606733 -0.04992722]
print(arr.std(axis=1))
# [0.99080274 2.03563299 3.01426507]
That is, the scale
parameter is the column-wise standard deviation, hence the need to transpose via .T
since you want row-wise inputs.
Related Topics
Windowserror: [Error 126] the Specified Module Could Not Be Found
Python - How to Check If Table Exists
How to Convert a Float into Hex
Python Number With 1000 Separator
Python Super :Typeerror: _Init_() Takes 2 Positional Arguments But 3 Were Given
Python Data Frame How to Find the Local Maximum in a 2D Array
How to Display Index During List Iteration With Django
Finding Length of the Longest List in an Irregular List of Lists
How to Find Words in a List That Starts With a Certain Letter the User Asked For
Use Tqdm Progress Bar With Pandas
Split String in a Spark Dataframe Column by Regular Expressions Capturing Groups
Collecting and Reporting Pytest Results
Django.Db.Utils.Operationalerror: (1045, Access Denied for User '<User>'@'Localhost'
How to Populate New Column Based on Values in Other Columns
How Does the Code Prints 1 2 6 24 as Output and Not 24 6 2 1
Fastest Way to Compute Image Dataset Channel Wise Mean and Standard Deviation in Python
Python: Filenotfounderror: [Winerror 3] the System Cannot Find the Path Specified: ''