Replace Na Values by Row Means

Replace missing values with row means if exactly N missing values per row

Here is the way I mentionned in comment, with more details:

# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there

# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))

# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]

# check df

df
#      a  b  c
# [1,] 1  1  1
# [2,] 2  2  2
# [3,] 3  3  3
# [4,] 4 NA NA
# [5,] 5  5  5
# [6,] 1  1  1
# [7,] 2  2  2
# [8,] 3  3  3
# [9,] 4 NA NA
#[10,] 5  5  5

Find and replace missing values with row mean

Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]

How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?

Here is a way to do it all in one chain using dplyr using your supplied data frame.

First create a vector of all column names of interest:

name_col <- colnames(mental)[2:16]

And now use dplyr

library(dplyr)

mental %>% 
  # First create the column of row means
  mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>% 
  # Now calculate the proportion of NAs
  mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>% 
  # Create this column for filtering out later
  mutate(somatic_usable = ifelse(somatic_na < 0.2,
                                 "yes", "no")) %>% 
  # Make the following replacement on a row basis 
  rowwise() %>%
  mutate_at(vars(name_col), # Designate eligible columns to check for NAs
            funs(replace(., 
                         is.na(.) & somatic_na < 0.2, # Both conditions need to be met
                         somatic_mean))) %>% # What we are subbing the NAs with
  ungroup() # Now ungroup the 'rowwise' in case you need to modify further

Now, if you wanted to only select the entries that have less than 20% NAs, you can pipe the above into the following:

filter(somatic_usable == "yes")

Also of note, if you wanted to instead make the condition less than or equal to 20%, you would need to replace the two somatic_na < 0.2 with somatic_na <= 0.2.

Hope this helps!

R: How to replace NA with most recent value by row

There are a series of non-base solutions:

zoo::na.locf(df$Value)
data.table::nafill(df$Value)

naniar is also a package that is completely designed surrounding NA handling.

Conditonally replace NA with value from other rows

Your mutate won't work because you did not assign any value to a variable. your mutate() should look like this mutate(value = unique(value[is.na(value)])). Althought this will not be my approach. What I did below was create a look up table of distinct non NA values and then joined them onto the original dataset. valuedis should be the values you want.

temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)

df <- as.data.frame(cbind(temporal, spatial, value))

library(dplyr)


dfdis <- df %>% 
          filter(!is.na(value)) %>% 
          distinct(temporal,spatial,value) %>% 
          rename(valuedis = value)

df2 <- left_join(df,dfdis, by = c("temporal","spatial"))

Replace NAs using mutate_at by row mean

Using the arr.ind-parameter of which together with is.na(df) and rowMeans, you can do this quite easily in base R:

i <- which(is.na(df), arr.ind = TRUE)
df[i] <- rowMeans(df[,-1], na.rm = TRUE)[i[,1]]

which gives:

> df
  ID Price1 Price2 Price3   Price4
1  1    2.1      3    4.0 3.033333
2  2    2.0      3    4.5 3.166667
3  3    2.0      3    4.0 3.000000
4  4    3.5      3    4.0 3.500000

What this does:

With which(is.na(df), arr.ind = TRUE) you get an array-index of the row and column numbers where there is an NA-value:

> which(is.na(df), arr.ind = TRUE)
     row col
[1,]   4   2
[2,]   3   3
[3,]   1   5
[4,]   2   5
[5,]   3   5
[6,]   4   5

With rowMeans(df[,-1], na.rm = TRUE) you get a vector of the means by row:

> rowMeans(df[,-1], na.rm = TRUE)
[1] 3.033333 3.166667 3.000000 3.500000

By indexing that with the row-column of the array index, you get vector that is as long as the number of NA-values in the dataframe:

> rowMeans(df[,-1], na.rm = TRUE)[i[,1]]
[1] 3.500000 3.000000 3.033333 3.166667 3.000000 3.500000

By indexing the dataframe df with the array-index, you tell R at which spots to put those values.