Pandas Dataframe: Replace Nan Values with Average of Columns

pandas DataFrame: replace nan values with average of columns

You can simply use DataFrame.fillna to fill the nan's directly:

In [27]: df 
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 NaN -2.027325 1.533582
4 NaN NaN 0.461821
5 -0.788073 NaN NaN
6 -0.916080 -0.612343 NaN
7 -0.887858 1.033826 NaN
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431

In [28]: df.mean()
Out[28]:
A -0.151121
B -0.231291
C -0.530307
dtype: float64

In [29]: df.fillna(df.mean())
Out[29]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.151121 -2.027325 1.533582
4 -0.151121 -0.231291 0.461821
5 -0.788073 -0.231291 -0.530307
6 -0.916080 -0.612343 -0.530307
7 -0.887858 1.033826 -0.530307
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431

The docstring of fillna says that value should be a scalar or a dict, however, it seems to work with a Series as well. If you want to pass a dict, you could use df.mean().to_dict().

Pandas DataFrame : Using Pandas Replace NaN Values with Average of above 3 rows

You can do like this:

s = df['order_values'].copy()
for i in range(3, len(s)):
s.iloc[i] = s.iloc[i-3:i].mean() if pd.isna(s.iloc[i]) else s.iloc[i]

df['order_values'] = s

print(df):

      Id  order_values
0 1002 45.000000
1 1002 36.000000
2 1002 18.000000
3 1002 33.000000
4 1002 29.000000
5 1002 72.000000
6 1003 68.000000
7 1003 54.000000
8 1003 45.000000
9 1003 55.666667
10 1003 51.555556
11 1004 14.000000
12 1004 50.000000
13 1004 27.000000
14 1004 30.333333

If you want it do it by groupby Id then you could write the above lines in a function and use groupby and transform like:

def fill_na_in_order_values(s):
for i in range(3, len(s)):
s.iloc[i] = s.iloc[i-3:i].mean() if pd.isna(s.iloc[i]) else s.iloc[i]
return s

df['order_values'] = df.groupby('Id')['order_values'].transform(fill_na_in_order_values)

Replace value with the average of it's column with Pandas

The first thing to recognize is the columns that have 'x' in them are not integer datatypes. They are object datatypes.

df = pd.read_csv('file.csv')

df

Col1 Col2
0 1 22
1 2 44
2 3 x
3 4 88
4 5 110
5 6 132
6 7 x
7 8 176
8 9 198
9 10 x

df.dtypes

Col1 int64
Col2 object
dtype: object

In order to get the mean of Col2, it needs to be converted to a numeric value.

df['Col2'] = pd.to_numeric(df['Col2'], errors='coerce').astype('Int64')

df.dtypes
Col1 int64
Col2 Int64
dtype: object

The df now looks like so:

df 

Col1 Col2
0 1 22
1 2 44
2 3 <NA>
3 4 88
4 5 110
5 6 132
6 7 <NA>
7 8 176
8 9 198
9 10 <NA>

Now we can use fillna() with df['Col2'].mean():

df['Col2'] = df['Col2'].fillna(df['Col2'].mean())

df
Col1 Col2
0 1 22
1 2 44
2 3 110
3 4 88
4 5 110
5 6 132
6 7 110
7 8 176
8 9 198
9 10 110

Delete and replace Nan values with mean of the rows in pandas dataframe

You could create a dictionary from the column names and row means and pass it to fillna to fill the NaN values. Then drop the NaN rows (which won't get filled in because all NaN rows have mean NaN).

out = df.fillna(dict.fromkeys(df.columns, df.mean(axis=1))).dropna()

Another possibility is to transpose the DataFrame and use fillna to fill, then transpose back:

df_T = df.T
df_T.fillna(df_T.mean()).T.dropna()

Output:

    c1   c2    c3
0 1.0 1.0 1.0
2 3.0 6.0 9.0
3 8.5 7.0 10.0

Replacing NaN values with column mean value does not change pandas dataframe NaN values

I think you need DataFrame.fillna by means per columns by means per columns (axis=0), which is default value, so should be omit:

df = df.fillna(value=df.mean())
print (df)
A B
1: 2.0 3
2: 2.0 1
3: 2.0 4

I think inplace is not good practice, check this and this.

Replace nan in column with the mean between two values python dynamically

I think you can simply use pandas.Series.interpolate

So here, since you want to fill the nan values with the average of the filled values above and below them it should look like:

df.Speed.interpolate()

This will return a series of all the speed measures and an interpolation for nan values.

Function to replace NaN values in a dataframe with mean of the related column

To fill NaN of each column with its respective mean use:

df.apply(lambda x: x.fillna(x.mean())) 

Pandas: Replace NaN with Average by Multi Level Index

Got it.

df_countries = df_countries.reset_index().set_index(original_index)

Forgot to keep the answer with the correct index... With this change, it works.
However, if anyone has a more pythonic way to do it, please add your answer!



Related Topics



Leave a reply



Submit