Replace Na Values by Row Means

Replace missing values with row means if exactly N missing values per row

Here is the way I mentionned in comment, with more details:

# create your matrix
df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there

# Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
nb_NA_row <- rowSums(is.na(df))

# Replace missing values row-wise by the row mean when there is N NA in the row
N <- 1 # the given example
df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]

# check df

df
# a b c
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 NA NA
# [5,] 5 5 5
# [6,] 1 1 1
# [7,] 2 2 2
# [8,] 3 3 3
# [9,] 4 NA NA
#[10,] 5 5 5

Find and replace missing values with row mean

Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df, na.rm = TRUE)[ind[,1]]

How to replace NAs with row means if proportion of row-wise NAs is below a certain threshold?

Here is a way to do it all in one chain using dplyr using your supplied data frame.

First create a vector of all column names of interest:

name_col <- colnames(mental)[2:16]

And now use dplyr

library(dplyr)

mental %>%
# First create the column of row means
mutate(somatic_mean = rowMeans(.[name_col], na.rm = TRUE)) %>%
# Now calculate the proportion of NAs
mutate(somatic_na = rowMeans(is.na(.[name_col]))) %>%
# Create this column for filtering out later
mutate(somatic_usable = ifelse(somatic_na < 0.2,
"yes", "no")) %>%
# Make the following replacement on a row basis
rowwise() %>%
mutate_at(vars(name_col), # Designate eligible columns to check for NAs
funs(replace(.,
is.na(.) & somatic_na < 0.2, # Both conditions need to be met
somatic_mean))) %>% # What we are subbing the NAs with
ungroup() # Now ungroup the 'rowwise' in case you need to modify further

Now, if you wanted to only select the entries that have less than 20% NAs, you can pipe the above into the following:

filter(somatic_usable == "yes")

Also of note, if you wanted to instead make the condition less than or equal to 20%, you would need to replace the two somatic_na < 0.2 with somatic_na <= 0.2.

Hope this helps!

R: How to replace NA with most recent value by row

There are a series of non-base solutions:

zoo::na.locf(df$Value)
data.table::nafill(df$Value)

naniar is also a package that is completely designed surrounding NA handling.

Conditonally replace NA with value from other rows

Your mutate won't work because you did not assign any value to a variable. your mutate() should look like this mutate(value = unique(value[is.na(value)])). Althought this will not be my approach. What I did below was create a look up table of distinct non NA values and then joined them onto the original dataset. valuedis should be the values you want.

temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)

df <- as.data.frame(cbind(temporal, spatial, value))

library(dplyr)


dfdis <- df %>%
filter(!is.na(value)) %>%
distinct(temporal,spatial,value) %>%
rename(valuedis = value)

df2 <- left_join(df,dfdis, by = c("temporal","spatial"))

Replace NAs using mutate_at by row mean

Using the arr.ind-parameter of which together with is.na(df) and rowMeans, you can do this quite easily in base R:

i <- which(is.na(df), arr.ind = TRUE)
df[i] <- rowMeans(df[,-1], na.rm = TRUE)[i[,1]]

which gives:

> df
ID Price1 Price2 Price3 Price4
1 1 2.1 3 4.0 3.033333
2 2 2.0 3 4.5 3.166667
3 3 2.0 3 4.0 3.000000
4 4 3.5 3 4.0 3.500000

What this does:

With which(is.na(df), arr.ind = TRUE) you get an array-index of the row and column numbers where there is an NA-value:

> which(is.na(df), arr.ind = TRUE)
row col
[1,] 4 2
[2,] 3 3
[3,] 1 5
[4,] 2 5
[5,] 3 5
[6,] 4 5

With rowMeans(df[,-1], na.rm = TRUE) you get a vector of the means by row:

> rowMeans(df[,-1], na.rm = TRUE)
[1] 3.033333 3.166667 3.000000 3.500000

By indexing that with the row-column of the array index, you get vector that is as long as the number of NA-values in the dataframe:

> rowMeans(df[,-1], na.rm = TRUE)[i[,1]]
[1] 3.500000 3.000000 3.033333 3.166667 3.000000 3.500000

By indexing the dataframe df with the array-index, you tell R at which spots to put those values.



Related Topics



Leave a reply



Submit