Replace NAs with mean of the same column of a data.table
na.aggregate
in the zoo package replaces NAs with the mean of the non-NAs in the same column:
library(zoo)
ww[, Sepal.Length := na.aggregate(Sepal.Length)]
Replace NAs with mean of the same column of a data.table
na.aggregate
in the zoo package replaces NAs with the mean of the non-NAs in the same column:
library(zoo)
ww[, Sepal.Length := na.aggregate(Sepal.Length)]
replace NAs in a column of a data.table with means of the same column grouped by a factor
We can use na.aggregate
from zoo
to replace the 'NA' with the mean
of the 'steps' after grouping by 'interval'
library(zoo)
steps.dt[, steps := na.aggregate(steps), interval]
Replace NAs in a Single Column of a Data Table in R
Your code isn't off unless the data in the column is not a character in which case you would have to set -999 as inter/numeric without ""
data <- read.table(header=TRUE, text='
id weight size
1 20 small
2 27 large
3 24 medium
')
data <- data.table(data)
> data[size == 'small', weight := NA]
> data
size id weight
1: small 1 NA
2: large 2 27
3: medium 3 24
> is.na(data)
size id weight
[1,] FALSE FALSE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
> data[is.na(weight), weight := -999]
> data
size id weight
1: small 1 -999
2: large 2 27
3: medium 3 24
> data[size == 'small', weight := NA]
> data[is.na(weight), weight := "-999"]
Warning message:
In `[.data.table`(data, is.na(weight), `:=`(weight, "-999")) :
Coerced 'character' RHS to 'integer' to match the column's type.
EDIT: This is, I just saw, what @dracodoc suggested in comment
Replace missing values with column mean
A relatively simple modification of your code should solve the issue:
for(i in 1:ncol(data)){
data[is.na(data[,i]), i] <- mean(data[,i], na.rm = TRUE)
}
data.table replace NA with mean for multiple columns and by id
To evaluate the columns with only the column names, we can use get()
. And we are going to need lapply()
to perform this operation over multiple columns.
## determine the column names that contain NA values
nm <- names(dat)[colSums(is.na(dat)) != 0]
## replace with the mean - by 'id'
dat[, (nm) := lapply(nm, function(x) {
x <- get(x)
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
}), by = id]
which gives the updated dat
id var1 var2 var3
1: 1 1.666667 4.000000 4
2: 1 1.000000 4.000000 4
3: 1 2.000000 4.000000 4
4: 1 2.000000 4.000000 3
5: 2 1.000000 5.000000 5
6: 2 1.000000 5.000000 5
7: 2 2.000000 4.666667 5
8: 2 2.000000 4.000000 4
Update: With your updated question, to avoid running this over all columns that contain NA, don't use nm
. Just use your own vector tomean
.
tomean <- c("var1", "var2")
dat[, (tomean) := lapply(tomean, function(x) {
x <- get(x)
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
}), by = id]
and this gives
id var1 var2 var3
1: 1 1.666667 4.000000 4
2: 1 1.000000 4.000000 4
3: 1 2.000000 4.000000 4
4: 1 2.000000 4.000000 NA
5: 2 1.000000 5.000000 5
6: 2 1.000000 5.000000 5
7: 2 2.000000 4.666667 5
8: 2 2.000000 4.000000 4
Replace NA and NaN with column mean across multiple columns
With dplyr
:
myinput %>%
group_by(Date) %>%
mutate_at(vars(-group_cols()),~ifelse(is.na(.) | is.nan(.),
mean(.,na.rm=TRUE),.))
# A tibble: 5 x 5
# Groups: Date [2]
Date A B C D
<fct> <dbl> <dbl> <dbl> <dbl>
1 20010331 3 4 6 1
2 20010331 4 5.5 5.5 2
3 20010331 5 7 5 3
4 20010630 2 8 7 8
5 20010630 2 8 7 8
Replacing NAs in a column with the values of other column
You can use coalesce
:
library(dplyr)
df1 <- data.frame(Letters, Char, stringsAsFactors = F)
df1 %>%
mutate(Char1 = coalesce(Char, Letters))
Letters Char Char1
1 A a a
2 B b b
3 C <NA> C
4 D d d
5 E <NA> E
Related Topics
Extracting Data Used to Make a Smooth Plot in Mgcv
List and Description of All Packages in Cran from Within R
How to Compute Weighted Mean in R
How to Display Strip Labels Below the Plot When Faceting
Send a Text String Containing Double Quotes to Function
How to Set Different Scale Limits for Different Facets
How to Show Every Second R Ggplot2 X-Axis Label Value
R Dplyr Join on Range of Dates
Consistent Factor Levels for Same Value Over Different Datasets
Updating a Subset of a Dataframe
How to Create a Vector of Functions
How to Add Gaussian Curve to Histogram Created with Qplot
Reduce Space Between Grid.Arrange Plots
Looping Through Covariates in Regression Using R
Problems with Dplyr and Posixlt Data
Fastest Way to Remove All Duplicates in R
Tidyverse Not Loaded, It Says "Namespace 'Vctrs' 0.2.0 Is Already Loaded, But >= 0.2.1 Is Required"