Replace missing values with row means if exactly N missing values per row
Here is the way I mentionned in comment, with more details:
# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there
# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))
# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]
# check df
df
# a b c
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 NA NA
# [5,] 5 5 5
# [6,] 1 1 1
# [7,] 2 2 2
# [8,] 3 3 3
# [9,] 4 NA NA
#[10,] 5 5 5
Find and replace missing values with row mean
Very similar to @baptiste's answer
> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df, na.rm = TRUE)[ind[,1]]
How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?
Here is a way to do it all in one chain using dplyr
using your supplied data frame.
First create a vector of all column names of interest:
name_col <- colnames(mental)[2:16]
And now use dplyr
library(dplyr)
mental %>%
# First create the column of row means
mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>%
# Now calculate the proportion of NAs
mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>%
# Create this column for filtering out later
mutate(somatic_usable = ifelse(somatic_na < 0.2,
"yes", "no")) %>%
# Make the following replacement on a row basis
rowwise() %>%
mutate_at(vars(name_col), # Designate eligible columns to check for NAs
funs(replace(.,
is.na(.) & somatic_na < 0.2, # Both conditions need to be met
somatic_mean))) %>% # What we are subbing the NAs with
ungroup() # Now ungroup the 'rowwise' in case you need to modify further
Now, if you wanted to only select the entries that have less than 20% NAs, you can pipe the above into the following:
filter(somatic_usable == "yes")
Also of note, if you wanted to instead make the condition less than or equal to 20%, you would need to replace the two somatic_na < 0.2
with somatic_na <= 0.2
.
Hope this helps!
R: How to replace NA with most recent value by row
There are a series of non-base solutions:
zoo::na.locf(df$Value)
data.table::nafill(df$Value)
naniar
is also a package that is completely designed surrounding NA handling.
Conditonally replace NA with value from other rows
Your mutate won't work because you did not assign any value to a variable. your mutate()
should look like this mutate(value = unique(value[is.na(value)]))
. Althought this will not be my approach. What I did below was create a look up table of distinct non NA values and then joined them onto the original dataset. valuedis should be the values you want.
temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)
df <- as.data.frame(cbind(temporal, spatial, value))
library(dplyr)
dfdis <- df %>%
filter(!is.na(value)) %>%
distinct(temporal,spatial,value) %>%
rename(valuedis = value)
df2 <- left_join(df,dfdis, by = c("temporal","spatial"))
Replace NAs using mutate_at by row mean
Using the arr.ind
-parameter of which
together with is.na(df)
and rowMeans
, you can do this quite easily in base R:
i <- which(is.na(df), arr.ind = TRUE)
df[i] <- rowMeans(df[,-1], na.rm = TRUE)[i[,1]]
which gives:
> df
ID Price1 Price2 Price3 Price4
1 1 2.1 3 4.0 3.033333
2 2 2.0 3 4.5 3.166667
3 3 2.0 3 4.0 3.000000
4 4 3.5 3 4.0 3.500000
What this does:
With which(is.na(df), arr.ind = TRUE)
you get an array-index of the row and column numbers where there is an NA
-value:
> which(is.na(df), arr.ind = TRUE)
row col
[1,] 4 2
[2,] 3 3
[3,] 1 5
[4,] 2 5
[5,] 3 5
[6,] 4 5
With rowMeans(df[,-1], na.rm = TRUE)
you get a vector of the means by row:
> rowMeans(df[,-1], na.rm = TRUE)
[1] 3.033333 3.166667 3.000000 3.500000
By indexing that with the row-column of the array index, you get vector that is as long as the number of NA
-values in the dataframe:
> rowMeans(df[,-1], na.rm = TRUE)[i[,1]]
[1] 3.500000 3.000000 3.033333 3.166667 3.000000 3.500000
By indexing the dataframe df
with the array-index, you tell R at which spots to put those values.
Related Topics
Should I Use a Data.Frame or a Matrix
Duplicate 'Row.Names' Are Not Allowed Error
How to Extract the Row with Min or Max Values
Find K Nearest Neighbors, Starting from a Distance Matrix
How to Remove Columns from a Data.Frame
Getting Over Query Limit After One Request with Geocode
Ggplot2: Adding Secondary Transformed X-Axis on Top of Plot
Create a Matrix of Scatterplots (Pairs() Equivalent) in Ggplot2
Legend Placement, Ggplot, Relative to Plotting Region
If - Else If - Else Statement and Brackets
Stacked Barplot with Colour Gradients for Each Bar
Colour Points in a Plot Differently Depending on a Vector of Values
How to Aggregate a Dataframe by Week
Way to Securely Give a Password to R Application from the Terminal