Replace Missing Values With Column Mean

Replace missing values with column mean

A relatively simple modification of your code should solve the issue:

for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}

Replace missing values with the mean of each variable Python

The last few columns are strings, not floats.

Try converting to float before taking the mean:

# Make Sure Everything is numeric
num_df = num_df.apply(pd.to_numeric, errors='coerce')
# Take Mean
num_df = num_df.fillna(num_df.mean())

print(num_df)
   id    sod  pot  hemo  pcv    wc    rc
0 0 111.0 2.5 15.4 44 7800 5.20
1 1 111.0 2.5 11.3 38 6000 4.55
2 2 111.0 2.5 9.6 31 7500 4.55
3 3 111.0 2.5 11.2 32 6700 3.90

pandas DataFrame: replace nan values with average of columns

You can simply use DataFrame.fillna to fill the nan's directly:

In [27]: df 
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 NaN -2.027325 1.533582
4 NaN NaN 0.461821
5 -0.788073 NaN NaN
6 -0.916080 -0.612343 NaN
7 -0.887858 1.033826 NaN
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431

In [28]: df.mean()
Out[28]:
A -0.151121
B -0.231291
C -0.530307
dtype: float64

In [29]: df.fillna(df.mean())
Out[29]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.151121 -2.027325 1.533582
4 -0.151121 -0.231291 0.461821
5 -0.788073 -0.231291 -0.530307
6 -0.916080 -0.612343 -0.530307
7 -0.887858 1.033826 -0.530307
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431

The docstring of fillna says that value should be a scalar or a dict, however, it seems to work with a Series as well. If you want to pass a dict, you could use df.mean().to_dict().

Replace missing value with mean of class within column

Using dplyr, you could group_by Class and apply NA2mean for every column.

library(dplyr)
DF %>% group_by(class) %>% mutate_all(NA2mean)

In the newer version of dplyr, you can do this across

DF %>% group_by(class) %>% mutate(across(everything(), NA2mean))

Replace value with the average of it's column with Pandas

The first thing to recognize is the columns that have 'x' in them are not integer datatypes. They are object datatypes.

df = pd.read_csv('file.csv')

df

Col1 Col2
0 1 22
1 2 44
2 3 x
3 4 88
4 5 110
5 6 132
6 7 x
7 8 176
8 9 198
9 10 x

df.dtypes

Col1 int64
Col2 object
dtype: object

In order to get the mean of Col2, it needs to be converted to a numeric value.

df['Col2'] = pd.to_numeric(df['Col2'], errors='coerce').astype('Int64')

df.dtypes
Col1 int64
Col2 Int64
dtype: object

The df now looks like so:

df 

Col1 Col2
0 1 22
1 2 44
2 3 <NA>
3 4 88
4 5 110
5 6 132
6 7 <NA>
7 8 176
8 9 198
9 10 <NA>

Now we can use fillna() with df['Col2'].mean():

df['Col2'] = df['Col2'].fillna(df['Col2'].mean())

df
Col1 Col2
0 1 22
1 2 44
2 3 110
3 4 88
4 5 110
5 6 132
6 7 110
7 8 176
8 9 198
9 10 110

Replace NA in DataFrame for multiple columns with mean per country

Note Example is based on your additional data source from the comments

Replacing the NA-Values for multiple columns with mean() you can combine the following three methods:

  • fillna() (Iterating per column axis should be 0, which is default value of fillna())
  • groupby()
  • transform()


Create data frame from your example:

df = pd.read_excel('https://happiness-report.s3.amazonaws.com/2021/DataPanelWHR2021C2.xls')




















































































Country nameyearLife LadderLog GDP per capitaSocial supportHealthy life expectancy at birthFreedom to make life choicesGenerosityPerceptions of corruptionPositive affectNegative affect
Canada20057.4180510.65180.96155271.30.9573060.256230.5026810.8385440.233278
Canada20077.4817510.7392nan71.660.9303410.2494790.4056080.8716040.25681
Canada20087.485610.73840.93870771.840.9263150.2615850.3695880.890220.202175
Canada20097.4878210.69720.94284572.020.9150580.2462170.4126220.8674330.247633
Canada20107.6503510.71650.95376572.20.9339490.2304510.412660.8788680.233113

How do I replace all NA with mean in R?

We can use na.aggregate from zoo. Loop through the columns of dataset (assuming all the columns are numeric ), apply the na.aggregate to replace the NA with mean values (by default) and assign it back to the dataset.

library(zoo)
df[] <- lapply(df, na.aggregate)

By default, the FUN argument of na.aggregate is mean:

Default S3 method:

na.aggregate(object, by = 1, ..., FUN = mean,
na.rm = FALSE, maxgap = Inf)

To do this nondestructively:

df2 <- df
df2[] <- lapply(df2, na.aggregate)

or in one line:

df2 <- replace(df, TRUE, lapply(df, na.aggregate))

If there are non-numeric columns, do this only for the numeric columns by creating a logical index first

ok <- sapply(df, is.numeric)
df[ok] <- lapply(df[ok], na.aggregate)

R: Replace all values in column (NA and values) with mean of values

We can just subset the non-NA elements to replace it

library(dplyr)
df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value[!is.na(Value)]))

Or use the na.rm in mean

df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value, na.rm = TRUE))


Related Topics



Leave a reply



Submit