Detect and Exclude Outliers in a Pandas Dataframe

Remove Outliers in Pandas DataFrame using Percentiles

The initial dataset.

print(df.head())

   Col0  Col1  Col2  Col3  Col4  User_id
0    49    31    93    53    39       44
1    69    13    84    58    24       47
2    41    71     2    43    58       64
3    35    56    69    55    36       67
4    64    24    12    18    99       67

First removing the User_id column

filt_df = df.loc[:, df.columns != 'User_id']

Then, computing percentiles.

low = .05
high = .95
quant_df = filt_df.quantile([low, high])
print(quant_df)

       Col0   Col1  Col2   Col3   Col4
0.05   2.00   3.00   6.9   3.95   4.00
0.95  95.05  89.05  93.0  94.00  97.05

Next filtering values based on computed percentiles. To do that I use an apply by columns and that's it !

filt_df = filt_df.apply(lambda x: x[(x>quant_df.loc[low,x.name]) & 
                                    (x < quant_df.loc[high,x.name])], axis=0)

Bringing the User_id back.

filt_df = pd.concat([df.loc[:,'User_id'], filt_df], axis=1)

Last, rows with NaN values can be dropped simply like this.

filt_df.dropna(inplace=True)
print(filt_df.head())

   User_id  Col0  Col1  Col2  Col3  Col4
1       47    69    13    84    58    24
3       67    35    56    69    55    36
5        9    95    79    44    45    69
6       83    69    41    66    87     6
9       87    50    54    39    53    40

Checking result

print(filt_df.head())

   User_id  Col0  Col1  Col2  Col3  Col4
0       44    49    31   NaN    53    39
1       47    69    13    84    58    24
2       64    41    71   NaN    43    58
3       67    35    56    69    55    36
4       67    64    24    12    18   NaN

print(filt_df.describe())

          User_id       Col0       Col1       Col2       Col3       Col4
count  100.000000  89.000000  88.000000  88.000000  89.000000  89.000000
mean    48.230000  49.573034  45.659091  52.727273  47.460674  57.157303
std     28.372292  25.672274  23.537149  26.509477  25.823728  26.231876
min      0.000000   3.000000   5.000000   7.000000   4.000000   5.000000
25%     23.000000  29.000000  29.000000  29.500000  24.000000  36.000000
50%     47.000000  50.000000  40.500000  52.500000  49.000000  59.000000
75%     74.250000  69.000000  67.000000  75.000000  70.000000  79.000000
max     99.000000  95.000000  89.000000  92.000000  91.000000  97.000000

How to generate the test dataset

np.random.seed(0)
nb_sample = 100
num_sample = (0,100)

d = dict()
d['User_id'] = np.random.randint(num_sample[0], num_sample[1], nb_sample)
for i in range(5):
    d['Col' + str(i)] = np.random.randint(num_sample[0], num_sample[1], nb_sample)

df = DataFrame.from_dict(d)

Removing outliers in a df containing mixed dtype

I would break the problem into stages:

Firstly, identify (numeric) columns you want to do the outlier removal.
Reference

newdf = df.select_dtypes(include=np.number)

Now perform whatever filtering/outlier removal you want on the rows of newdf. Afterwards, newdf should contain only rows you wish to retain.

Then keep only the rows of df those index are in newdf. Reference

df = df[df.index.isin(newdf.index)]

Automating removing outliers from a pandas dataframe using IQR as the parameter and putting the variables in a list

Based on comments on the original post, I suggest you do the following and revamp your solution.

I believe this answer provides a quick solution to your problem, so remember to search on SO before posting. This will remove all rows where one (or more) of the wanted column values is an outlier.

cols = ['pdays', 'campaign', 'previous'] # The columns you want to search for outliers in

# Calculate quantiles and IQR
Q1 = dummy_df[cols].quantile(0.25) # Same as np.percentile but maps (0,1) and not (0,100)
Q3 = dummy_df[cols].quantile(0.75)
IQR = Q3 - Q1

# Return a boolean array of the rows with (any) non-outlier column values
condition = ~((dummy_df[cols] < (Q1 - 1.5 * IQR)) | (dummy_df[cols] > (Q3 + 1.5 * IQR))).any(axis=1)

# Filter our dataframe based on condition
filtered_df = dummy_df[condition]

How do I remove outliers from a dataframe that contains floating integers in Y-axis and dates in X-axis?

You can remove the outliers by comparing them to the mean or median (I suggest using the median). Divide the distance between each value and the median by the distance between the maximum and median values if it is greater than a threshold value (eg 0.98, It depends on your data and only you can select it) Delete that data.
For example, if you set your threshold to 1, the farthest data will be deleted.

Eliminate outliers in a dataframe with different dtypes - Pandas

Try using select_dtypes to get all columns from df of a particular type.

To select all numeric types, use np.number or 'number'

new_df = df[
    (np.abs(stats.zscore(df.select_dtypes(include=np.number))) < 3).all(axis=1)
]

Checking a Pandas Dataframe for Outliers

Scatter plots or distribution plots are good for pointing outliers. But in context to the question of pandas data frames here's how I would do it.

df.decribe()

Will give you a good matrix of mean, max, and all percentile. Look into the max of the column to point out the outlier if its greater than 75 percentile of values.

Then df['Sensor Value'].value_counts()should give you the frequency of the values. You will have the outliers shown right here with greater values and that of less frequency.

Get their indexes and just drop them using df.drop(indexes_list, inplace=True)

EDIT:
You could also check outlier with mean +/- 3 * standard deviation.

Example code:

outliers = df[df[col] > df[col].mean() + 3 * df[col].std()]