replace NA with groups mean in a non specified number of columns
If you don't mind using dplyr
:
library(dplyr)
dat %>%
group_by(ID) %>%
mutate_if(is.numeric, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
#> # A tibble: 7 x 5
#> # Groups: ID [2]
#> id ID length width extra
#> <int> <fctr> <dbl> <dbl> <dbl>
#> 1 101 collembola 2.1 0.90 1
#> 2 102 mite 1.5 0.70 3
#> 3 103 mite 1.1 0.80 2
#> 4 104 collembola 1.0 0.70 3
#> 5 105 collembola 1.5 0.50 4
#> 6 106 mite 1.5 0.75 3
#> 7 106 mite 1.9 0.75 4
How to replace NA with mean by group / subset?
Not my own technique I saw it on the boards a while back:
dat <- read.table(text = "id taxa length width
101 collembola 2.1 0.9
102 mite 0.9 0.7
103 mite 1.1 0.8
104 collembola NA NA
105 collembola 1.5 0.5
106 mite NA NA", header=TRUE)
library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
width = impute.mean(width))
dat2[order(dat2$id), ] #plyr orders by group so we have to reorder
Edit A non plyr approach with a for
loop:
for (i in which(sapply(dat, is.numeric))) {
for (j in which(is.na(dat[, i]))) {
dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i], na.rm = TRUE)
}
}
Edit many moons later here is a data.table & dplyr approach:
data.table
library(data.table)
setDT(dat)
dat[, length := impute.mean(length), by = taxa][,
width := impute.mean(width), by = taxa]
dplyr
library(dplyr)
dat %>%
group_by(taxa) %>%
mutate(
length = impute.mean(length),
width = impute.mean(width)
)
replace NA with mean of column groups
Here is another solution using reshape
from base R, an often forgotten function with amazing power.
x2 = reshape(x, direction = 'long', varying = 4:9, sep = "")
x2[,c('a', 'b')] = apply(x2[,c('a', 'b')], 2, function(y){
y[is.na(y)] = mean(y, na.rm = T)
return(y)
})
x3 = reshape(x2, direction = 'wide', idvar = names(x2)[1:3], timevar = 'time',
sep = "")
Here is how it works. First, we reshape the data to long format, where a
and b
become columns and the years become rows. Second, we replace NAs in columns a
and b
with their respective means. Finally, we reshape the data back to the wide format. reshape
is a confusing function, but working through the examples on the help page will get you up to speed.
EDIT
To reorder columns, you can do
x3[,names(x)]
To replace the rownames, you can do
rownames(x3) = 1:NROW(x3)
Replace NA with grouped means in R?
I slightly changed your example, because the data frame you provided had columns of different lengths, but this should solve your problem:
First, I loaded the packages in tidyverse. Then I grouped data by month. The second pipe runs a mutate_all function so it automatically changes all columns.
library(tidyverse)
df <- tibble(x1 = c(13, NA, 16, 17, 16, 12), x2 = c(1, 4, 3, 5, NA, 4),
month = c(1, 1, 1, 2, 2, 2))
new_df <- df %>% group_by(month) %>%
mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE),.)))
Let me know if this is of any help.
How to replace NA values in a table for selected columns
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2
or c("a", "b")
can be replaced by a pre-defined vector.
R: Replace all values in column (NA and values) with mean of values
We can just subset the non-NA elements to replace it
library(dplyr)
df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value[!is.na(Value)]))
Or use the na.rm
in mean
df %>%
group_by(Day, Plate) %>%
mutate(Value = mean(Value, na.rm = TRUE))
how to replace several NA values in columns of a data frame with the mean of the values of the columns
Don't use <<-
. Very rarely (never) it is useful. Try using it this way :
nar <- function(x) {x[is.na(x)] <- round(mean(x, na.rm = TRUE));x}
dfv[,col] <- sapply(dfv[,col], nar)
replace NA value with the group value
Try ave
. It applies a function to groups. Have a look at ?ave
for details, e.g.:
df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))
# person_id hhold_no med_card med_card_new
#1 1 1 1 1
#2 2 1 1 1
#3 3 1 NA 1
#4 4 1 NA 1
#5 5 1 NA 1
#6 6 2 0 0
#7 7 2 0 0
#8 8 2 0 0
#9 9 2 0 0
Please note that this will only work if not all values in a household are NA
and the should not differ (e.g. person 1 == 1, person 2 == 0).
Replace NA with previous or next value, by group, using dplyr
library(tidyr) #fill is part of tidyr
ps1 %>%
group_by(userID) %>%
#fill(color, age, gender) %>% #default direction down
fill(color, age, gender, .direction = "downup")
Which gives you:
Source: local data frame [9 x 4]
Groups: userID [3]
userID color age gender
<dbl> <fctr> <fctr> <fctr>
1 21 blue 3yrs F
2 21 blue 2yrs F
3 21 red 2yrs M
4 22 blue 3yrs F
5 22 blue 3yrs F
6 22 blue 3yrs F
7 23 red 4yrs F
8 23 red 4yrs F
9 23 gold 4yrs F
Related Topics
Replacing All Missing Values in R Data.Table with a Value
How to Compute Roc and Auc Under Roc After Training Using Caret in R
Conditional Assignment of One Variable to the Value of One of Two Other Variables
Can R Read from a File Through an Ssh Connection
Summarise_At Using Different Functions for Different Variables
Glpk: No Such File or Directory Error When Trying to Install R Package
Large Matrices in R: Long Vectors Not Supported Yet
How to Remove a Level of Lists from a List of Lists
Split Date Data (M/D/Y) into 3 Separate Columns
Change Size of Axes Title and Labels in Ggplot2
How to Extract Fitted Splines from a Gam ('Mgcv::Gam')
How to Include Interactive Input in Script to Be Run from the Command Line
Accessing Excel File from Sharepoint with R
How to Get Factor Matrices in R
How to Plot a Contour Line Showing Where 95% of Values Fall Within, in R and in Ggplot2