R Change Na Values

How do I replace NA values with zeros in an R dataframe?

See my comment in @gsk3 answer. A simple example:

> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   4  3 NA  3  7  6  6 10  6   5
2   9  8  9  5 10 NA  2  1  7   2
3   1  1  6  3  6 NA  1  4  1   6
4  NA  4 NA  7 10  2 NA  4  1   8
5   1  2  4 NA  2  6  2  6  7   4
6  NA  3 NA NA 10  2  1 10  8   4
7   4  4  9 10  9  8  9  4 10  NA
8   5  8  3  2  1  4  5  9  4   7
9   3  9 10  1  9  9 10  5  3   3
10  4  2  2  5 NA  9  7  2  5   5

> d[is.na(d)] <- 0

> d
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   4  3  0  3  7  6  6 10  6   5
2   9  8  9  5 10  0  2  1  7   2
3   1  1  6  3  6  0  1  4  1   6
4   0  4  0  7 10  2  0  4  1   8
5   1  2  4  0  2  6  2  6  7   4
6   0  3  0  0 10  2  1 10  8   4
7   4  4  9 10  9  8  9  4 10   0
8   5  8  3  2  1  4  5  9  4   7
9   3  9 10  1  9  9 10  5  3   3
10  4  2  2  5  0  9  7  2  5   5

There's no need to apply apply. =)

EDIT

You should also take a look at norm package. It has a lot of nice features for missing data analysis. =)

Replace NA values by - in R

In base R, we may do

output[is.na(output)] <- "-"

-output

> output
        date ABC CDE FGH SUM
1 2021-06-30   4   1   6  11
2 2021-07-02   1   -   -   1

Replace NA values in data frame with the column mean

library(tidyverse)
df1 <- tibble(x = seq(3), y = c(1, NA, 2))
df1 %>% mutate(y = y %>% replace_na(mean(df1$y, na.rm = TRUE)))
#> # A tibble: 3 × 2
#>       x     y
#>   <int> <dbl>
#> 1     1   1  
#> 2     2   1.5
#> 3     3   2

^{Created on 2022-03-10 by the reprex package (v2.0.0)}

How to only replace NA with specific values based on a condition in another variable

Changed the vector to have an %in% statement and added an else statement.

d %>%
  mutate(Udd = case_when(is.na(Udd) & Edu < 8 ~ 1,
                         is.na(Udd) & Edu %in% c(8:11) ~ 2,
                         is.na(Udd) & Edu > 11 ~ 3,
                         TRUE ~ Udd))

Replace NA in a dataframe, keeping the column value distribution

In base R you could do:

set.seed(5)  
data.frame(lapply(df,\(x)replace(x,is.na(x),sample(na.omit(x),sum(is.na(x))))))

   Person_ID Var1 Var2
1          A    1    3
2          B    1    2
3          C    2    1
4          D    1    4
5          E    1    3
6          F    1    1
7          G    1    3
8          H    1    1
9          I    2    2
10         J    1    1
11         K    1    3
12         L    2    4

How to replace all NA values in numerical columns only with median values and update the dataframe

Based on your screenshots, it looks like you're just going back to the RStudio viewer window to look at the data frame again. If so, the issue is this:

When you write test2 %>% mutate_if(...), you're telling R to change something in test2 and return the result (roughly meaning, in this context, to just print the result and show it to you). What you're not telling it to do is to save that result anywhere.

You would want something like test2 <- test2 %>% mutate_if(...) to overwrite the existing test2 data frame in your global environment, or something like test3 <- test2 %>% mutate_if(...) to give it a new name and store the modified thing as a separate object while retaining the old one.

Lastly, I would echo Andrea M's concern that you might not want to do this at all. Imputing missing data with averages is, on a good day, risky.

Replace only some NA values for selected rows and for only a column in R

df$type[!df$Asked & is.na(df$type)] <- "Replies" gets you to your desired table:

> type <-
+   c(NA, rep("Question",3), NA, NA,  rep("Answer",4), rep(NA, 3), rep("Answer",2),
+     NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+   T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
   title comments     type Asked
1      1        1     <NA>  TRUE
2      2        2 Question FALSE
3      3        3 Question FALSE
4      4        4 Question FALSE
5      5        5  Replies FALSE
6      6        6  Replies FALSE
7      7        7   Answer FALSE
8      8        8   Answer FALSE
9      9        9   Answer FALSE
10    10       10   Answer FALSE
11    11       11     <NA>  TRUE
12    12       12  Replies FALSE
13    13       13  Replies FALSE
14    14       14   Answer FALSE
15    15       15   Answer FALSE
16    16       16     <NA>  TRUE
17    17       17 Question FALSE
18    18       18  Replies FALSE
19    19       19   Answer FALSE
20    20       20   Answer FALSE
21    21       21     <NA>  TRUE
22    22       22  Replies FALSE

Replace NA with interpolated value for specific column fields in r

It is specified in the ?na.approx

An object of similar structure as object with NAs replaced by interpolation. For na.approx only the internal NAs are replaced and leading or trailing NAs are omitted if na.rm = TRUE or not replaced if na.rm = FALSE.

By default, the na.approx uses na.rm = TRUE

na.approx(object, x = index(object), xout, ..., na.rm = TRUE, maxgap = Inf, along)

Thus, we can change the code to

my_data[, 42] <- na.approx(my_data[, 42], na.rm = FALSE)

In a large dataset, it is possible to have leading/lagging NAs and using the OP's code results in an output vector with less number of elements as na.rm = TRUE, which triggers the length difference error in replacement