R: Replacing NA values by mean of hour with dplyr
Try
shop.data %>%
group_by(hour) %>%
mutate(profit= ifelse(is.na(profit), mean(profit, na.rm=TRUE), profit))
# day hour profit
#1 1 8 100
#2 1 16 200
#3 2 8 50
#4 2 16 60
#5 3 8 75
#6 3 16 130
Or you could use replace
shop.data %>%
group_by(hour) %>%
mutate(profit= replace(profit, is.na(profit), mean(profit, na.rm=TRUE)))
How to replace NA data of specific dates with the average data of different years of same dates of a dataframe in R?
This might work. It groups the months and days pairs and the replace the NA
s from the mean.
library(dplyr)
A <- A %>%
group_by(month, day, hour, minute) %>%
mutate(rain = ifelse(is.na(rain),
mean(rain, na.rm=TRUE), rain))
Replace missing values with corresponding day mean
A solution using dplyr
. We can use mutate
with ifelse
to replace the missing values with NA
. The key is to use group_by
on the same Day
so the mean calculation would be that group only.
library(dplyr)
dt2 <- dt %>%
group_by(Day) %>%
mutate(Sales = ifelse(is.na(Sales), mean(Sales, na.rm = TRUE), Sales)) %>%
ungroup()
dt2
# # A tibble: 9 x 2
# Day Sales
# <fctr> <dbl>
# 1 12-01-17 28.0
# 2 13-01-17 13.0
# 3 14-01-17 2.0
# 4 12-01-17 33.0
# 5 13-01-17 17.0
# 6 14-01-17 11.0
# 7 12-01-17 23.0
# 8 13-01-17 21.0
# 9 14-01-17 6.5
DATA
dt <- read.table(text = " Day Sales
12-01-17 NA
13-01-17 13
14-01-17 2
12-01-17 33
13-01-17 NA
14-01-17 11
12-01-17 23
13-01-17 21
14-01-17 NA",
header = TRUE)
Replace NA with mean of variable grouped by time and treatment
I think I would just use indexing in base R for this:
within(df, {A1[is.na(A1) & time == 0] <- mean(A1[trt == "2" & time == 0])
B1[is.na(B1) & time == 0] <- mean(B1[trt == "2" & time == 0])})
#> # A tibble: 24 x 4
#> time trt A1 B1
#> <dbl> <fct> <dbl> <dbl>
#> 1 0 2 6.30 5.73
#> 2 0 2 5.43 5.73
#> 3 0 2 5.60 5.45
#> 4 0 1 5.78 5.63
#> 5 0 1 5.78 5.63
#> 6 0 1 5.78 5.63
#> 7 14 2 6.17 6.60
#> 8 14 2 6.43 7.03
#> 9 14 2 6.82 7.12
#> 10 14 1 2.30 3.03
#> # ... with 14 more rows
Created on 2020-05-15 by the reprex package (v0.3.0)
Replace missing value with average of that month
a=with(dat,ave(Occupancy.,sub(".*?\\/","",Date),ID,FUN=function(x)mean(x,na.rm=T)))
> transform(dat,b=replace(x<-Occupancy.,y<-is.na(x),a[y]))
Date ID Occupancy. b
1 1/2/2018 1 95 95.00000
2 2/2/2018 1 94 94.00000
3 3/2/2018 1 94 94.00000
4 4/2/2018 1 96 96.00000
5 5/2/2018 1 94 94.00000
6 6/2/2018 1 NA 94.71429
7 7/2/2018 1 96 96.00000
8 8/2/2018 1 94 94.00000
9 1/2/2018 2 75 75.00000
10 2/2/2018 2 NA 78.33333
11 3/2/2018 2 79 79.00000
12 4/2/2018 2 82 82.00000
13 5/2/2018 2 NA 78.33333
14 6/2/2018 2 76 76.00000
15 7/2/2018 2 78 78.00000
16 8/2/2018 2 80 80.00000
How to replace NA with cero in a columns, if the columns beside have a values? using R
You could also do as follows:
library(dplyr)
mutate(df, X = if_else(is.na(hours) | is.na(interactions), 0, hours))
# hours interactions sales X
# 1 NA NA 1 0
# 2 3 3 1 3
# 3 NA 9 1 0
# 4 8 9 NA 8
Complete missing hour in dataframe with NA using dplyr in R
We can use complete
library(dplyr)
library(tidyr)
mydata %>%
complete(datex, hourx = 0:23)
R Replacing NA values with the next value of another column value within groups
Here's is a possible dplyr
solution. This is a combination of ifelse
and lead
, while the end product should be converted to as.POSIXct
again as a result of lost information due to the use of ifelse
library(dplyr)
tmpdf %>%
group_by(spaceNum) %>%
mutate(time.OUT = as.POSIXct(ifelse(is.na(time.OUT), lead(time.IN), time.OUT), origin = "1970-01-01"))
# Source: local data frame [7 x 3]
# Groups: spaceNum
#
# spaceNum time.IN time.OUT
# 1 1 2015-09-04 16:30:00 2015-09-04 18:00:00
# 2 1 2015-09-04 19:50:00 2015-09-04 21:00:00
# 3 1 2015-09-04 21:00:00 <NA>
# 4 2 2015-09-05 12:00:00 2015-09-05 13:00:00
# 5 2 2015-09-05 13:00:00 2015-09-05 13:21:00
# 6 2 2015-09-05 16:00:00 2015-09-05 16:48:00
# 7 2 2015-09-05 17:00:00 <NA>
Impute missing values with the average of the remainder
You can use ave
for such operations.
dat$Weight <-
ave(dat$Weight,dat$Hour,FUN=function(x){
mm <- mean(x,na.rm=TRUE)
ifelse(is.na(x),mm,x)
})
- You will apply a function by group of hours.
- For each group you compute the mean wuthout missing values.
- You assign the mean if the value is a missing value otherwise you keep the origin value.
- You replace the Weight vector by the new created vector.
Related Topics
Ggplot X-Axis Labels with All X-Axis Values
Group Data and Plot Multiple Lines
Extracting Unique Rows from a Data Table in R
R: Extracting "Clean" Utf-8 Text from a Web Page Scraped with Rcurl
Calculating Time Difference Between Two Columns
How to Use Cast or Another Function to Create a Binary Table in R
Formatting a Date in R Without Leading Zeros
List of Word Frequencies Using R
Filling in Missing (Blanks) in a Data Table, Per Category - Backwards and Forwards
How to Use Dplyr's Summarize and Which() to Lookup Min/Max Values
In Ggplot2, Coord_Flip and Free Scales Don't Work Together
R: Replacing Na Values by Mean of Hour with Dplyr