How to Calculate the Inverse of the Normal Cumulative Distribution Function in Python

How to calculate the inverse of the normal cumulative distribution function in python?

NORMSINV (mentioned in a comment) is the inverse of the CDF of the standard normal distribution. Using scipy, you can compute this with the ppf method of the scipy.stats.norm object. The acronym ppf stands for percent point function, which is another name for the quantile function.

In [20]: from scipy.stats import norm

In [21]: norm.ppf(0.95)
Out[21]: 1.6448536269514722

Check that it is the inverse of the CDF:

In [34]: norm.cdf(norm.ppf(0.95))
Out[34]: 0.94999999999999996

By default, norm.ppf uses mean=0 and stddev=1, which is the "standard" normal distribution. You can use a different mean and standard deviation by specifying the loc and scale arguments, respectively.

In [35]: norm.ppf(0.95, loc=10, scale=2)
Out[35]: 13.289707253902945

If you look at the source code for scipy.stats.norm, you'll find that the ppf method ultimately calls scipy.special.ndtri. So to compute the inverse of the CDF of the standard normal distribution, you could use that function directly:

In [43]: from scipy.special import ndtri

In [44]: ndtri(0.95)
Out[44]: 1.6448536269514722

ndtri is much faster than norm.ppf:

In [46]: %timeit norm.ppf(0.95)
240 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [47]: %timeit ndtri(0.95)
1.47 µs ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

How to calculate the inverse of the log normal cumulative distribution function in python?

Yes:

from scipy import stats
import numpy as np
stats.lognorm(0.5, scale=np.exp(2)).ppf(0.005)

from http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.stats.lognorm.html

Please check the meaning of your quantities. Actually 2 and 0.5 are the mean and the std-deviation of the random variable Y=exp(X), where X is the log-normal defined in the code (as also written in the excel documentation). The mean and the std-deviation of the distribution defined in the code are 8.37 and 4.46

Calculate the Cumulative Distribution Function (CDF) in Python

(It is possible that my interpretation of the question is wrong. If the question is how to get from a discrete PDF into a discrete CDF, then np.cumsum divided by a suitable constant will do if the samples are equispaced. If the array is not equispaced, then np.cumsum of the array multiplied by the distances between the points will do.)

If you have a discrete array of samples, and you would like to know the CDF of the sample, then you can just sort the array. If you look at the sorted result, you'll realize that the smallest value represents 0% , and largest value represents 100 %. If you want to know the value at 50 % of the distribution, just look at the array element which is in the middle of the sorted array.

Let us have a closer look at this with a simple example:

import matplotlib.pyplot as plt
import numpy as np

# create some randomly ddistributed data:
data = np.random.randn(10000)

# sort the data:
data_sorted = np.sort(data)

# calculate the proportional values of samples
p = 1. * np.arange(len(data)) / (len(data) - 1)

# plot the sorted data:
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax1.plot(p, data_sorted)
ax1.set_xlabel('$p$')
ax1.set_ylabel('$x$')

ax2 = fig.add_subplot(122)
ax2.plot(data_sorted, p)
ax2.set_xlabel('$x$')
ax2.set_ylabel('$p$')

This gives the following plot where the right-hand-side plot is the traditional cumulative distribution function. It should reflect the CDF of the process behind the points, but naturally, it is not as long as the number of points is finite.

cumulative distribution function

This function is easy to invert, and it depends on your application which form you need.

Why not the inverse of inverse function of a standard normal distribution calculation at scipy.stats python identical?

Because they are numerical approximations. Just like sqrt(2) * sqrt(2) is not 2. There is just no floating point number such that it's square (in floating point precision) is exactly 2.

Check out Goldberg's classic What every computer scientist should know about floating point, ACM Computing Surveys 23:1 (mar 1991), pp. 5-48.



Related Topics



Leave a reply



Submit