How to Replace Na With Mean by Group/Subset

How to replace NA with mean by group / subset?

Not my own technique I saw it on the boards a while back:

dat <- read.table(text = "id    taxa        length  width
101 collembola 2.1 0.9
102 mite 0.9 0.7
103 mite 1.1 0.8
104 collembola NA NA
105 collembola 1.5 0.5
106 mite NA NA", header=TRUE)


library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
width = impute.mean(width))

dat2[order(dat2$id), ] #plyr orders by group so we have to reorder

Edit A non plyr approach with a for loop:

for (i in which(sapply(dat, is.numeric))) {
for (j in which(is.na(dat[, i]))) {
dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i], na.rm = TRUE)
}
}

Edit many moons later here is a data.table & dplyr approach:

data.table

library(data.table)
setDT(dat)

dat[, length := impute.mean(length), by = taxa][,
width := impute.mean(width), by = taxa]

dplyr

library(dplyr)

dat %>%
group_by(taxa) %>%
mutate(
length = impute.mean(length),
width = impute.mean(width)
)

Replace NA with grouped means in R?

I slightly changed your example, because the data frame you provided had columns of different lengths, but this should solve your problem:

First, I loaded the packages in tidyverse. Then I grouped data by month. The second pipe runs a mutate_all function so it automatically changes all columns.

library(tidyverse)

df <- tibble(x1 = c(13, NA, 16, 17, 16, 12), x2 = c(1, 4, 3, 5, NA, 4),
month = c(1, 1, 1, 2, 2, 2))


new_df <- df %>% group_by(month) %>%
mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE),.)))

Let me know if this is of any help.

Use tidyverse to replace NA with mean of data, by group

Use mutate and dplyr::case_when instead of summarise :

dat1 %>%
group_by(id, operator) %>%
mutate(nummos2 = case_when(is.na(nummos) ~ mean(nummos, na.rm=TRUE),
TRUE ~ as.numeric(nummos)
)
)

Replace NA with mean based on row subset matching another column?

You could use ave() with replace() (or standard manual replacement).

df$weight <- with(df, ave(weight, gender,
FUN = function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))))

which gives

  gender   weight
1 FEMALE 110.0000
2 FEMALE 120.0000
3 FEMALE 112.0000
4 FEMALE 114.0000
5 FEMALE 114.0000
6 MALE 190.0000
7 MALE 202.0000
8 MALE 195.0000
9 MALE 195.6667

replace NA with groups mean in a non specified number of columns

If you don't mind using dplyr:



library(dplyr)

dat %>%
group_by(ID) %>%
mutate_if(is.numeric, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
#> # A tibble: 7 x 5
#> # Groups: ID [2]
#> id ID length width extra
#> <int> <fctr> <dbl> <dbl> <dbl>
#> 1 101 collembola 2.1 0.90 1
#> 2 102 mite 1.5 0.70 3
#> 3 103 mite 1.1 0.80 2
#> 4 104 collembola 1.0 0.70 3
#> 5 105 collembola 1.5 0.50 4
#> 6 106 mite 1.5 0.75 3
#> 7 106 mite 1.9 0.75 4

Replace NA with mean of variable grouped by time and treatment

I think I would just use indexing in base R for this:

within(df, {A1[is.na(A1) & time == 0] <- mean(A1[trt == "2" & time == 0])
B1[is.na(B1) & time == 0] <- mean(B1[trt == "2" & time == 0])})
#> # A tibble: 24 x 4
#> time trt A1 B1
#> <dbl> <fct> <dbl> <dbl>
#> 1 0 2 6.30 5.73
#> 2 0 2 5.43 5.73
#> 3 0 2 5.60 5.45
#> 4 0 1 5.78 5.63
#> 5 0 1 5.78 5.63
#> 6 0 1 5.78 5.63
#> 7 14 2 6.17 6.60
#> 8 14 2 6.43 7.03
#> 9 14 2 6.82 7.12
#> 10 14 1 2.30 3.03
#> # ... with 14 more rows

Created on 2020-05-15 by the reprex package (v0.3.0)

replace NA with mean of column groups

Here is another solution using reshape from base R, an often forgotten function with amazing power.

x2 = reshape(x, direction = 'long', varying = 4:9, sep = "")
x2[,c('a', 'b')] = apply(x2[,c('a', 'b')], 2, function(y){
y[is.na(y)] = mean(y, na.rm = T)
return(y)
})
x3 = reshape(x2, direction = 'wide', idvar = names(x2)[1:3], timevar = 'time',
sep = "")

Here is how it works. First, we reshape the data to long format, where a and b become columns and the years become rows. Second, we replace NAs in columns a and b with their respective means. Finally, we reshape the data back to the wide format. reshape is a confusing function, but working through the examples on the help page will get you up to speed.

EDIT

To reorder columns, you can do

x3[,names(x)]

To replace the rownames, you can do

rownames(x3) = 1:NROW(x3)

R: Replace all values in column (NA and values) with mean of values

We can just subset the non-NA elements to replace it

library(dplyr)
df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value[!is.na(Value)]))

Or use the na.rm in mean

df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value, na.rm = TRUE))


Related Topics



Leave a reply



Submit