Is There a Numpy Builtin to Reject Outliers from a List

Is there a numpy builtin to reject outliers from a list

This method is almost identical to yours, just more numpyst (also working on numpy arrays only):

def reject_outliers(data, m=2):
return data[abs(data - np.mean(data)) < m * np.std(data)]

Python: Find outliers inside a list

Assuming your list is l:

  • If you know you want to filter a certain percentile/quantile, you can
    use:

    This removes bottom 10% and top 90%. Of course, you can change any of
    them to your desired cut-off (for example you can remove the bottom filter and only filter the top 90% in your example):

    import numpy as np
    l = np.array(l)
    l = l[(l>np.quantile(l,0.1)) & (l<np.quantile(l,0.9))].tolist()

    output:

    [ 3  2 14  2  8  4  3  5]
  • If you are not sure of the percentile cut-off and are looking to
    remove outliers:

    You can adjust your cut-off for outliers by adjusting argument m in
    function call. The larger it is, the less outliers are removed. This function seems to be more robust to various types of outliers compared to other outlier removal techniques.

     import numpy as np 
    l = np.array(l)
    def reject_outliers(data, m=6.):
    d = np.abs(data - np.median(data))
    mdev = np.median(d)
    s = d / (mdev if mdev else 1.)
    return data[s < m].tolist()
    print(reject_outliers(l))

    output:

    [1, 3, 2, 14, 2, 1, 8, 1, 4, 3, 5]

Python remove outliers from data

I believe the method you're referring to is to remove values > 1.5 * the interquartile range away from the median. So first, calculate your initial statistics:

statBefore = pd.DataFrame({'q1': grouped['Value'].quantile(.25), \
'median': grouped['Value'].median(), 'q3' : grouped['Value'].quantile(.75)})

And then determine whether values in the original DF are outliers:

def is_outlier(row):
iq_range = statBefore.loc[row.ID]['q3'] - statBefore.loc[row.ID]['q1']
median = statBefore.loc[row.ID]['median']
if row.Value > (median + (1.5* iq_range)) or row.Value < (median - (1.5* iq_range)):
return True
else:
return False
#apply the function to the original df:
df.loc[:, 'outlier'] = df.apply(is_outlier, axis = 1)
#filter to only non-outliers:
df_no_outliers = df[~(df.outlier)]

How to detect outliners from a list

Since std_1 = 33.413, any element in list_of_values divided by std_1 will be smaller than the threshold and hence not yielded.

Can scipy.stats identify and mask obvious outliers?

The statsmodels package has what you need. Look at this little code snippet and its output:

# Imports #
import statsmodels.api as smapi
import statsmodels.graphics as smgraphics
# Make data #
x = range(30)
y = [y*10 for y in x]
# Add outlier #
x.insert(6,15)
y.insert(6,220)
# Make graph #
regression = smapi.OLS(x, y).fit()
figure = smgraphics.regressionplots.plot_fit(regression, 0)
# Find outliers #
test = regression.outlier_test()
outliers = ((x[i],y[i]) for i,t in enumerate(test) if t[2] < 0.5)
print 'Outliers: ', list(outliers)

Example figure 1

Outliers: [(15, 220)]

Edit

With the newer version of statsmodels, things have changed a bit. Here is a new code snippet that shows the same type of outlier detection.

# Imports #
from random import random
import statsmodels.api as smapi
from statsmodels.formula.api import ols
import statsmodels.graphics as smgraphics
# Make data #
x = range(30)
y = [y*(10+random())+200 for y in x]
# Add outlier #
x.insert(6,15)
y.insert(6,220)
# Make fit #
regression = ols("data ~ x", data=dict(data=y, x=x)).fit()
# Find outliers #
test = regression.outlier_test()
outliers = ((x[i],y[i]) for i,t in enumerate(test.icol(2)) if t < 0.5)
print 'Outliers: ', list(outliers)
# Figure #
figure = smgraphics.regressionplots.plot_fit(regression, 1)
# Add line #
smgraphics.regressionplots.abline_plot(model_results=regression, ax=figure.axes[0])

Example figure 2

Outliers: [(15, 220)]

Please explain (python): example_list[a b]

< will return either True or False, and they are equal to 1 and 0 in Python. Hence you'll get either first or second item.

>>> True == 1
True
>>> False == 0
True
>>> 'ab'[True]
'b'
>>> 'ab'[False]
'a'

This thing was helpful in older versions of Python when the conditional expressions were not introduced:

>>> a = 'aa'
>>> b = 'bb'
>>> [10, 20][a>b]
10
>>> 20 if a > b else 10
10

Related:

  • Is False == 0 and True == 1 in Python an implementation detail or is it guaranteed by the language?
  • Is it guaranteed that False “is 0” and True “is 1”?


Related Topics



Leave a reply



Submit