Generating Random Numbers from Skewed Normal Distribution

generate N random numbers from a skew normal distribution using numpy

I start by generating the PDF curves for reference:

NUM_SAMPLES = 100000
SKEW_PARAMS = [-3, 0]

def skew_norm_pdf(x,e=0,w=1,a=0):
    # adapated from:
    # http://stackoverflow.com/questions/5884768/skew-normal-distribution-in-scipy
    t = (x-e) / w
    return 2.0 * w * stats.norm.pdf(t) * stats.norm.cdf(a*t)

# generate the skew normal PDF for reference:
location = 0.0
scale = 1.0
x = np.linspace(-5,5,100) 

plt.subplots(figsize=(12,4))
for alpha_skew in SKEW_PARAMS:
    p = skew_norm_pdf(x,location,scale,alpha_skew)
    # n.b. note that alpha is a parameter that controls skew, but the 'skewness'
    # as measured will be different. see the wikipedia page:
    # https://en.wikipedia.org/wiki/Skew_normal_distribution
    plt.plot(x,p)

PDFs of skew normal distributions

Next I found a VB implementation of sampling random numbers from the skew normal distribution and converted it to python:

# literal adaption from:
# http://stackoverflow.com/questions/4643285/how-to-generate-random-numbers-that-follow-skew-normal-distribution-in-matlab
# original at:
# http://www.ozgrid.com/forum/showthread.php?t=108175
def rand_skew_norm(fAlpha, fLocation, fScale):
    sigma = fAlpha / np.sqrt(1.0 + fAlpha**2) 

    afRN = np.random.randn(2)
    u0 = afRN[0]
    v = afRN[1]
    u1 = sigma*u0 + np.sqrt(1.0 -sigma**2) * v 

    if u0 >= 0:
        return u1*fScale + fLocation 
    return (-u1)*fScale + fLocation 

def randn_skew(N, skew=0.0):
    return [rand_skew_norm(skew, 0, 1) for x in range(N)]

# lets check they at least visually match the PDF:
plt.subplots(figsize=(12,4))
for alpha_skew in SKEW_PARAMS:
    p = randn_skew(NUM_SAMPLES, alpha_skew)
    sns.distplot(p)

histograms from skew normal distributions as generated

And then wrote a quick version which (without extensive testing) appears to be correct:

def randn_skew_fast(N, alpha=0.0, loc=0.0, scale=1.0):
    sigma = alpha / np.sqrt(1.0 + alpha**2) 
    u0 = np.random.randn(N)
    v = np.random.randn(N)
    u1 = (sigma*u0 + np.sqrt(1.0 - sigma**2)*v) * scale
    u1[u0 < 0] *= -1
    u1 = u1 + loc
    return u1

# lets check again
plt.subplots(figsize=(12,4))
for alpha_skew in SKEW_PARAMS:
    p = randn_skew_fast(NUM_SAMPLES, alpha_skew)
    sns.distplot(p)

histograms from skew normal distributions as generated by the faster method

How to generate random numbers with skewed normal distribution in R?

With the function cp2dp you can convert from the population mean, the population standard deviation and the population skewness to the parameters xi, omega and alpha of the skew-normal distribution.

library(sn)
params <- cp2dp(c(-3.99, 3.17, -0.71), "SN")
sims <- replicate(1000, rsn(130, dp = params))

The SN family only supports skew between -0.99527 and 0.99527. Outside of this range, the ST family is needed, which requires a fourth variable: kurtosis:

library(sn)
params <- cp2dp(c(-3.99, 3.17, -1.71, 2.37), "ST")
sims <- replicate(1000, rst(130, dp = params))

Note the use of rst instead of rsn in this case.

Create random numbers with left skewed probability distribution

Like you described, just make sure your skewed-distribution adds up to 1.0:

pers = np.arange(1,101,1)

# Make each of the last 41 elements 5x more likely
prob = [1.0]*(len(pers)-41) + [5.0]*41

# Normalising to 1.0
prob /= np.sum(prob)

number = np.random.choice(pers, 1, p=prob)

Sample from a skewed distribution in R

Choosing a log normal distribution with μ = 2.415195 and σ = 0.3355733 will give a distrubution with (approximately) your requirements.

mu <- 2.415195
sigma <- 0.3355733

N <- 10000000
nums <- rlnorm(N, mu, sigma)

Approximately 95% of values are between 5 and 20.

sum(5 < nums & nums < 20) / N
#> [1] 0.9500141

Mode is 10

ggplot(tibble(x = nums), aes(x)) +
  geom_density() +
  geom_vline(xintercept = 10, color = "red") +
  geom_vline(xintercept = c(5, 20), color = "blue")

distribution

I got these parameters using optimize.

From any σ, we can calculate what μ gives us a mode of 10 because the mode is: mode

So we want to find what σ gets us closest to 95% of values between 5 and 20. This can be calculated from the difference between cdf(20) and cdf(5). The CDF for the log normal distribution is: cdf . (plnorm in R).

f <- function(sigma) {
  mu <- log(10) + sigma^2
  
  abs(plnorm(20, mu, sigma) - plnorm(5, mu, sigma) - 0.95)
}

optimize(f, lower = 0, upper = 1)
#> $minimum
#> [1] 0.3355733
#> 
#> $objective
#> [1] 1.160349e-05

how to generate random numbers that follow skew normal distribution in matlab

Can't vouch for their performance/adequacy, but http://azzalini.stat.unipd.it/SN/ says the following, and has a link to a .zip file of MATLAB functions:

The library has been ported to Matlab by Nicola Sartori. So far, this refers to update 0.21; hence facilities for the skew-t distribution are not included. A portion of the facilities for the skew-t distribution is however available via a set of Matlab functions which have been written and made available by Enrique Batiz (Enrique.Batiz [at] postgrad.mbs.ac.uk)

Also see this code which is in visual Basic, but should be easily portable. Relevant excerpt shown below. This uses RandNorm (also in the linked webpage) which is a pair of numbers from a unit normal distribution, and in MATLAB you should be able to use randn(2,1).

Function RandSkew(fAlpha As Single, _
                  Optional fLocation As Single = 0, _
                  Optional fScale As Single = 1, _
                  Optional bVolatile As Boolean = False) As Single

    ' shg 2008-0919
    ' http://azzalini.stat.unipd.it/SN/faq.html

    ' Returns a random variable with skewed distribution
    '       fAlpha      = skew
    '       fLocation   = location
    '       fScale > 0  = scale

    Dim sigma   As Single
    Dim afRN()  As Single
    Dim u0      As Single
    Dim v       As Single
    Dim u1      As Single

    If bVolatile Then Application.Volatile
    Randomize (Timer)

    sigma = fAlpha / Sqr(1 + fAlpha ^ 2)

    afRN = RandNorm()
    u0 = afRN(1)
    v = afRN(2)
    u1 = sigma * u0 + Sqr(1 - sigma ^ 2) * v

    RandSkew = IIf(u0 >= 0, u1, -u1) * fScale + fLocation
End Function