Replace Nas with Mean of the Same Column of a Data.Table

Replace NAs with mean of the same column of a data.table

na.aggregate in the zoo package replaces NAs with the mean of the non-NAs in the same column:

library(zoo)

ww[, Sepal.Length := na.aggregate(Sepal.Length)]

Replace NAs with mean of the same column of a data.table

na.aggregate in the zoo package replaces NAs with the mean of the non-NAs in the same column:

library(zoo)

ww[, Sepal.Length := na.aggregate(Sepal.Length)]

replace NAs in a column of a data.table with means of the same column grouped by a factor

We can use na.aggregate from zoo to replace the 'NA' with the mean of the 'steps' after grouping by 'interval'

library(zoo)
steps.dt[, steps := na.aggregate(steps), interval]

Replace NAs in a Single Column of a Data Table in R

Your code isn't off unless the data in the column is not a character in which case you would have to set -999 as inter/numeric without ""

data <- read.table(header=TRUE, text='
id weight size
1 20 small
2 27 large
3 24 medium
')

data <- data.table(data)

> data[size == 'small', weight := NA]
> data
size id weight
1: small 1 NA
2: large 2 27
3: medium 3 24
> is.na(data)
size id weight
[1,] FALSE FALSE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
> data[is.na(weight), weight := -999]
> data
size id weight
1: small 1 -999
2: large 2 27
3: medium 3 24
> data[size == 'small', weight := NA]
> data[is.na(weight), weight := "-999"]
Warning message:
In `[.data.table`(data, is.na(weight), `:=`(weight, "-999")) :
Coerced 'character' RHS to 'integer' to match the column's type.

EDIT: This is, I just saw, what @dracodoc suggested in comment

Replace missing values with column mean

A relatively simple modification of your code should solve the issue:

for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}

data.table replace NA with mean for multiple columns and by id

To evaluate the columns with only the column names, we can use get(). And we are going to need lapply() to perform this operation over multiple columns.

## determine the column names that contain NA values
nm <- names(dat)[colSums(is.na(dat)) != 0]
## replace with the mean - by 'id'
dat[, (nm) := lapply(nm, function(x) {
x <- get(x)
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
}), by = id]

which gives the updated dat

   id     var1     var2 var3
1: 1 1.666667 4.000000 4
2: 1 1.000000 4.000000 4
3: 1 2.000000 4.000000 4
4: 1 2.000000 4.000000 3
5: 2 1.000000 5.000000 5
6: 2 1.000000 5.000000 5
7: 2 2.000000 4.666667 5
8: 2 2.000000 4.000000 4

Update: With your updated question, to avoid running this over all columns that contain NA, don't use nm. Just use your own vector tomean.

tomean <- c("var1", "var2")
dat[, (tomean) := lapply(tomean, function(x) {
x <- get(x)
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
}), by = id]

and this gives

   id     var1     var2 var3
1: 1 1.666667 4.000000 4
2: 1 1.000000 4.000000 4
3: 1 2.000000 4.000000 4
4: 1 2.000000 4.000000 NA
5: 2 1.000000 5.000000 5
6: 2 1.000000 5.000000 5
7: 2 2.000000 4.666667 5
8: 2 2.000000 4.000000 4

Replace NA and NaN with column mean across multiple columns

With dplyr:

myinput %>% 
group_by(Date) %>%
mutate_at(vars(-group_cols()),~ifelse(is.na(.) | is.nan(.),
mean(.,na.rm=TRUE),.))
# A tibble: 5 x 5
# Groups: Date [2]
Date A B C D
<fct> <dbl> <dbl> <dbl> <dbl>
1 20010331 3 4 6 1
2 20010331 4 5.5 5.5 2
3 20010331 5 7 5 3
4 20010630 2 8 7 8
5 20010630 2 8 7 8

Replacing NAs in a column with the values of other column

You can use coalesce:

library(dplyr)

df1 <- data.frame(Letters, Char, stringsAsFactors = F)

df1 %>%
mutate(Char1 = coalesce(Char, Letters))

Letters Char Char1
1 A a a
2 B b b
3 C <NA> C
4 D d d
5 E <NA> E


Related Topics



Leave a reply



Submit