Replace missing values with column mean
A relatively simple modification of your code should solve the issue:
for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}
Replace missing values with the mean of each variable Python
The last few columns are strings, not floats.
Try converting to float before taking the mean
:
# Make Sure Everything is numeric
num_df = num_df.apply(pd.to_numeric, errors='coerce')
# Take Mean
num_df = num_df.fillna(num_df.mean())
print(num_df)
id sod pot hemo pcv wc rc
0 0 111.0 2.5 15.4 44 7800 5.20
1 1 111.0 2.5 11.3 38 6000 4.55
2 2 111.0 2.5 9.6 31 7500 4.55
3 3 111.0 2.5 11.2 32 6700 3.90
pandas DataFrame: replace nan values with average of columns
You can simply use DataFrame.fillna
to fill the nan
's directly:
In [27]: df
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 NaN -2.027325 1.533582
4 NaN NaN 0.461821
5 -0.788073 NaN NaN
6 -0.916080 -0.612343 NaN
7 -0.887858 1.033826 NaN
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
In [28]: df.mean()
Out[28]:
A -0.151121
B -0.231291
C -0.530307
dtype: float64
In [29]: df.fillna(df.mean())
Out[29]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.151121 -2.027325 1.533582
4 -0.151121 -0.231291 0.461821
5 -0.788073 -0.231291 -0.530307
6 -0.916080 -0.612343 -0.530307
7 -0.887858 1.033826 -0.530307
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
The docstring of fillna
says that value
should be a scalar or a dict, however, it seems to work with a Series
as well. If you want to pass a dict, you could use df.mean().to_dict()
.
Replace missing value with mean of class within column
Using dplyr
, you could group_by
Class
and apply NA2mean
for every column.
library(dplyr)
DF %>% group_by(class) %>% mutate_all(NA2mean)
In the newer version of dplyr
, you can do this across
DF %>% group_by(class) %>% mutate(across(everything(), NA2mean))
Replace value with the average of it's column with Pandas
The first thing to recognize is the columns that have 'x' in them are not integer datatypes. They are object datatypes.
df = pd.read_csv('file.csv')
df
Col1 Col2
0 1 22
1 2 44
2 3 x
3 4 88
4 5 110
5 6 132
6 7 x
7 8 176
8 9 198
9 10 x
df.dtypes
Col1 int64
Col2 object
dtype: object
In order to get the mean of Col2, it needs to be converted to a numeric value.
df['Col2'] = pd.to_numeric(df['Col2'], errors='coerce').astype('Int64')
df.dtypes
Col1 int64
Col2 Int64
dtype: object
The df now looks like so:
df
Col1 Col2
0 1 22
1 2 44
2 3 <NA>
3 4 88
4 5 110
5 6 132
6 7 <NA>
7 8 176
8 9 198
9 10 <NA>
Now we can use fillna() with df['Col2'].mean():
df['Col2'] = df['Col2'].fillna(df['Col2'].mean())
df
Col1 Col2
0 1 22
1 2 44
2 3 110
3 4 88
4 5 110
5 6 132
6 7 110
7 8 176
8 9 198
9 10 110
Replace NA in DataFrame for multiple columns with mean per country
Note Example is based on your additional data source from the comments
Replacing the NA-Values for multiple columns with mean()
you can combine the following three methods:
fillna()
(Iterating per columnaxis
should be 0, which is default value offillna()
)groupby()
transform()
Create data frame from your example:
df = pd.read_excel('https://happiness-report.s3.amazonaws.com/2021/DataPanelWHR2021C2.xls')
Country name | year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect |
---|---|---|---|---|---|---|---|---|---|---|
Canada | 2005 | 7.41805 | 10.6518 | 0.961552 | 71.3 | 0.957306 | 0.25623 | 0.502681 | 0.838544 | 0.233278 |
Canada | 2007 | 7.48175 | 10.7392 | nan | 71.66 | 0.930341 | 0.249479 | 0.405608 | 0.871604 | 0.25681 |
Canada | 2008 | 7.4856 | 10.7384 | 0.938707 | 71.84 | 0.926315 | 0.261585 | 0.369588 | 0.89022 | 0.202175 |
Canada | 2009 | 7.48782 | 10.6972 | 0.942845 | 72.02 | 0.915058 | 0.246217 | 0.412622 | 0.867433 | 0.247633 |
Canada | 2010 | 7.65035 | 10.7165 | 0.953765 | 72.2 | 0.933949 | 0.230451 | 0.41266 | 0.878868 | 0.233113 |
How do I replace all NA with mean in R?
We can use na.aggregate
from zoo
. Loop through the columns of dataset (assuming all the columns are numeric
), apply the na.aggregate
to replace the NA with mean
values (by default) and assign it back to the dataset.
library(zoo)
df[] <- lapply(df, na.aggregate)
By default, the FUN
argument of na.aggregate
is mean
:
Default S3 method:
na.aggregate(object, by = 1, ..., FUN = mean,
na.rm = FALSE, maxgap = Inf)
To do this nondestructively:
df2 <- df
df2[] <- lapply(df2, na.aggregate)
or in one line:
df2 <- replace(df, TRUE, lapply(df, na.aggregate))
If there are non-numeric columns, do this only for the numeric columns by creating a logical index first
ok <- sapply(df, is.numeric)
df[ok] <- lapply(df[ok], na.aggregate)
R: Replace all values in column (NA and values) with mean of values
We can just subset the non-NA elements to replace it
library(dplyr)
df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value[!is.na(Value)]))
Or use the na.rm
in mean
df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value, na.rm = TRUE))
Related Topics
Long/Bigint/Decimal Equivalent Datatype in R
How to Match Fuzzy Match Strings from Two Datasets
How Can One Work Fully Generically in Data.Table in R With Column Names in Variables
Rename Multiple Columns by Names
Reshape Multiple Values At Once
Horizontal/Vertical Line in Plotly
Dplyr: "Error in N(): Function Should Not Be Called Directly"
R Ifelse to Replace Values in a Column
Geom_Bar Bars Not Displaying When Specifying Ylim
R Shiny: Handle Action Buttons in Data Table
How Does One Reorder Columns in a Data Frame
How to Delete Multiple Values from a Vector
Put Stars on Ggplot Barplots and Boxplots - to Indicate the Level of Significance (P-Value)
A Comprehensive Survey of the Types of Things in R; 'Mode' and 'Class' and 'Typeof' Are Insufficient
How to Install a Package That Has Been Archived from Cran
Finding Rows Containing a Value (Or Values) in Any Column
How to Open CSV File in R When R Says "No Such File or Directory"
Coalesce Two String Columns With Alternating Missing Values to One