Replace Na with Groups Mean in a Non Specified Number of Columns

replace NA with groups mean in a non specified number of columns

If you don't mind using dplyr:

library(dplyr)

dat %>% 
  group_by(ID) %>% 
  mutate_if(is.numeric, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
#> # A tibble: 7 x 5
#> # Groups:   ID [2]
#>      id         ID length width extra
#>   <int>     <fctr>  <dbl> <dbl> <dbl>
#> 1   101 collembola    2.1  0.90     1
#> 2   102       mite    1.5  0.70     3
#> 3   103       mite    1.1  0.80     2
#> 4   104 collembola    1.0  0.70     3
#> 5   105 collembola    1.5  0.50     4
#> 6   106       mite    1.5  0.75     3
#> 7   106       mite    1.9  0.75     4

How to replace NA with mean by group / subset?

Not my own technique I saw it on the boards a while back:

dat <- read.table(text = "id    taxa        length  width
101   collembola  2.1     0.9
102   mite        0.9     0.7
103   mite        1.1     0.8
104   collembola  NA      NA
105   collembola  1.5     0.5
106   mite        NA      NA", header=TRUE)

library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
     width = impute.mean(width))

dat2[order(dat2$id), ] #plyr orders by group so we have to reorder

Edit A non plyr approach with a for loop:

for (i in which(sapply(dat, is.numeric))) {
    for (j in which(is.na(dat[, i]))) {
        dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i],  na.rm = TRUE)
    }
}

Edit many moons later here is a data.table & dplyr approach:

data.table

library(data.table)
setDT(dat)

dat[, length := impute.mean(length), by = taxa][,
    width := impute.mean(width), by = taxa]

dplyr

library(dplyr)

dat %>%
    group_by(taxa) %>%
    mutate(
        length = impute.mean(length),
        width = impute.mean(width)  
    )

replace NA with mean of column groups

Here is another solution using reshape from base R, an often forgotten function with amazing power.

x2 = reshape(x, direction = 'long', varying = 4:9, sep = "")
x2[,c('a', 'b')] = apply(x2[,c('a', 'b')], 2, function(y){
  y[is.na(y)] = mean(y, na.rm = T)
  return(y)
})
x3 = reshape(x2, direction = 'wide', idvar = names(x2)[1:3], timevar = 'time', 
 sep = "")

Here is how it works. First, we reshape the data to long format, where a and b become columns and the years become rows. Second, we replace NAs in columns a and b with their respective means. Finally, we reshape the data back to the wide format. reshape is a confusing function, but working through the examples on the help page will get you up to speed.

EDIT

To reorder columns, you can do

x3[,names(x)]

To replace the rownames, you can do

rownames(x3) = 1:NROW(x3)

Replace NA with grouped means in R?

I slightly changed your example, because the data frame you provided had columns of different lengths, but this should solve your problem:

First, I loaded the packages in tidyverse. Then I grouped data by month. The second pipe runs a mutate_all function so it automatically changes all columns.

library(tidyverse)

df <- tibble(x1 = c(13, NA, 16, 17, 16, 12), x2 = c(1, 4, 3, 5, NA, 4),
             month = c(1, 1, 1, 2, 2, 2))

new_df <- df %>%  group_by(month) %>%
  mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE),.)))

Let me know if this is of any help.

How to replace NA values in a table for selected columns

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

R: Replace all values in column (NA and values) with mean of values

We can just subset the non-NA elements to replace it

library(dplyr)
df %>%
     group_by(Day, Plate) %>%
     mutate(Value = mean(Value[!is.na(Value)]))

Or use the na.rm in mean

df %>%
     group_by(Day, Plate) %>%
     mutate(Value = mean(Value, na.rm = TRUE))

how to replace several NA values in columns of a data frame with the mean of the values of the columns

Don't use <<-. Very rarely (never) it is useful. Try using it this way :

nar <- function(x) {x[is.na(x)] <- round(mean(x, na.rm = TRUE));x}
dfv[,col] <- sapply(dfv[,col], nar)

replace NA value with the group value

Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.:

df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))

#   person_id hhold_no med_card med_card_new
#1          1        1        1            1
#2          2        1        1            1
#3          3        1       NA            1
#4          4        1       NA            1
#5          5        1       NA            1
#6          6        2        0            0
#7          7        2        0            0
#8          8        2        0            0
#9          9        2        0            0

Please note that this will only work if not all values in a household are NA and the should not differ (e.g. person 1 == 1, person 2 == 0).

Replace NA with previous or next value, by group, using dplyr

library(tidyr) #fill is part of tidyr

ps1 %>% 
  group_by(userID) %>% 
  #fill(color, age, gender) %>% #default direction down
  fill(color, age, gender, .direction = "downup")

Which gives you:

Source: local data frame [9 x 4]
Groups: userID [3]

  userID  color    age gender
   <dbl> <fctr> <fctr> <fctr>
1     21   blue   3yrs      F
2     21   blue   2yrs      F
3     21    red   2yrs      M
4     22   blue   3yrs      F
5     22   blue   3yrs      F
6     22   blue   3yrs      F
7     23    red   4yrs      F
8     23    red   4yrs      F
9     23   gold   4yrs      F

Replace Na with Groups Mean in a Non Specified Number of Columns