pandas DataFrame: replace nan values with average of columns
You can simply use DataFrame.fillna
to fill the nan
's directly:
In [27]: df
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 NaN -2.027325 1.533582
4 NaN NaN 0.461821
5 -0.788073 NaN NaN
6 -0.916080 -0.612343 NaN
7 -0.887858 1.033826 NaN
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
In [28]: df.mean()
Out[28]:
A -0.151121
B -0.231291
C -0.530307
dtype: float64
In [29]: df.fillna(df.mean())
Out[29]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.151121 -2.027325 1.533582
4 -0.151121 -0.231291 0.461821
5 -0.788073 -0.231291 -0.530307
6 -0.916080 -0.612343 -0.530307
7 -0.887858 1.033826 -0.530307
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
The docstring of fillna
says that value
should be a scalar or a dict, however, it seems to work with a Series
as well. If you want to pass a dict, you could use df.mean().to_dict()
.
Pandas DataFrame : Using Pandas Replace NaN Values with Average of above 3 rows
You can do like this:
s = df['order_values'].copy()
for i in range(3, len(s)):
s.iloc[i] = s.iloc[i-3:i].mean() if pd.isna(s.iloc[i]) else s.iloc[i]
df['order_values'] = s
print(df):
Id order_values
0 1002 45.000000
1 1002 36.000000
2 1002 18.000000
3 1002 33.000000
4 1002 29.000000
5 1002 72.000000
6 1003 68.000000
7 1003 54.000000
8 1003 45.000000
9 1003 55.666667
10 1003 51.555556
11 1004 14.000000
12 1004 50.000000
13 1004 27.000000
14 1004 30.333333
If you want it do it by groupby Id
then you could write the above lines in a function and use groupby
and transform
like:
def fill_na_in_order_values(s):
for i in range(3, len(s)):
s.iloc[i] = s.iloc[i-3:i].mean() if pd.isna(s.iloc[i]) else s.iloc[i]
return s
df['order_values'] = df.groupby('Id')['order_values'].transform(fill_na_in_order_values)
Replace value with the average of it's column with Pandas
The first thing to recognize is the columns that have 'x' in them are not integer datatypes. They are object datatypes.
df = pd.read_csv('file.csv')
df
Col1 Col2
0 1 22
1 2 44
2 3 x
3 4 88
4 5 110
5 6 132
6 7 x
7 8 176
8 9 198
9 10 x
df.dtypes
Col1 int64
Col2 object
dtype: object
In order to get the mean of Col2, it needs to be converted to a numeric value.
df['Col2'] = pd.to_numeric(df['Col2'], errors='coerce').astype('Int64')
df.dtypes
Col1 int64
Col2 Int64
dtype: object
The df now looks like so:
df
Col1 Col2
0 1 22
1 2 44
2 3 <NA>
3 4 88
4 5 110
5 6 132
6 7 <NA>
7 8 176
8 9 198
9 10 <NA>
Now we can use fillna() with df['Col2'].mean():
df['Col2'] = df['Col2'].fillna(df['Col2'].mean())
df
Col1 Col2
0 1 22
1 2 44
2 3 110
3 4 88
4 5 110
5 6 132
6 7 110
7 8 176
8 9 198
9 10 110
Delete and replace Nan values with mean of the rows in pandas dataframe
You could create a dictionary from the column names and row means and pass it to fillna
to fill the NaN values. Then drop the NaN rows (which won't get filled in because all NaN rows have mean NaN).
out = df.fillna(dict.fromkeys(df.columns, df.mean(axis=1))).dropna()
Another possibility is to transpose the DataFrame and use fillna
to fill, then transpose back:
df_T = df.T
df_T.fillna(df_T.mean()).T.dropna()
Output:
c1 c2 c3
0 1.0 1.0 1.0
2 3.0 6.0 9.0
3 8.5 7.0 10.0
Replacing NaN values with column mean value does not change pandas dataframe NaN values
I think you need DataFrame.fillna
by means per columns by means per columns (axis=0), which is default value, so should be omit:
df = df.fillna(value=df.mean())
print (df)
A B
1: 2.0 3
2: 2.0 1
3: 2.0 4
I think inplace
is not good practice, check this and this.
Replace nan in column with the mean between two values python dynamically
I think you can simply use pandas.Series.interpolate
So here, since you want to fill the nan values with the average of the filled values above and below them it should look like:
df.Speed.interpolate()
This will return a series of all the speed measures and an interpolation for nan values.
Function to replace NaN values in a dataframe with mean of the related column
To fill NaN
of each column with its respective mean use:
df.apply(lambda x: x.fillna(x.mean()))
Pandas: Replace NaN with Average by Multi Level Index
Got it.
df_countries = df_countries.reset_index().set_index(original_index)
Forgot to keep the answer with the correct index... With this change, it works.
However, if anyone has a more pythonic way to do it, please add your answer!
Related Topics
How to Avoid "Runtimeerror: Dictionary Changed Size During Iteration" Error
Regular Expression Matching a Multiline Block of Text
Typeerror: Unhashable Type: 'Dict'
How to Construct a Timedelta Object from a Simple String
How to Get the Input from the Tkinter Text Widget
How to Remove Non-Ascii Characters But Leave Periods and Spaces
Python: Access Class Property from String
Why Isn't My Pandas 'Apply' Function Referencing Multiple Columns Working
Scraping: Ssl: Certificate_Verify_Failed Error for Http://En.Wikipedia.Org
Case Insensitive Regular Expression Without Re.Compile
Django Set Default Form Values
How to Activate an Anaconda Environment
Quick and Easy File Dialog in Python
Create a .CSV File with Values from a Python List
Pandas: Rolling Mean by Time Interval
Matplotlib: Format Axis Offset-Values to Whole Numbers or Specific Number
What Can You Use Generator Functions For
Socketserver.Threadingtcpserver - Cannot Bind to Address After Program Restart