R Change Na Values

How do I replace NA values with zeros in an R dataframe?

See my comment in @gsk3 answer. A simple example:

> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5

> d[is.na(d)] <- 0

> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5

There's no need to apply apply. =)

EDIT

You should also take a look at norm package. It has a lot of nice features for missing data analysis. =)

Replace NA values by - in R

In base R, we may do

output[is.na(output)] <- "-"

-output

> output
date ABC CDE FGH SUM
1 2021-06-30 4 1 6 11
2 2021-07-02 1 - - 1

Replace NA values in data frame with the column mean

library(tidyverse)
df1 <- tibble(x = seq(3), y = c(1, NA, 2))
df1 %>% mutate(y = y %>% replace_na(mean(df1$y, na.rm = TRUE)))
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 1
#> 2 2 1.5
#> 3 3 2

Created on 2022-03-10 by the reprex package (v2.0.0)

How to only replace NA with specific values based on a condition in another variable

Changed the vector to have an %in% statement and added an else statement.

d %>%
mutate(Udd = case_when(is.na(Udd) & Edu < 8 ~ 1,
is.na(Udd) & Edu %in% c(8:11) ~ 2,
is.na(Udd) & Edu > 11 ~ 3,
TRUE ~ Udd))

Replace NA in a dataframe, keeping the column value distribution

In base R you could do:

set.seed(5)  
data.frame(lapply(df,\(x)replace(x,is.na(x),sample(na.omit(x),sum(is.na(x))))))

Person_ID Var1 Var2
1 A 1 3
2 B 1 2
3 C 2 1
4 D 1 4
5 E 1 3
6 F 1 1
7 G 1 3
8 H 1 1
9 I 2 2
10 J 1 1
11 K 1 3
12 L 2 4

How to replace all NA values in numerical columns only with median values and update the dataframe

Based on your screenshots, it looks like you're just going back to the RStudio viewer window to look at the data frame again. If so, the issue is this:

When you write test2 %>% mutate_if(...), you're telling R to change something in test2 and return the result (roughly meaning, in this context, to just print the result and show it to you). What you're not telling it to do is to save that result anywhere.

You would want something like test2 <- test2 %>% mutate_if(...) to overwrite the existing test2 data frame in your global environment, or something like test3 <- test2 %>% mutate_if(...) to give it a new name and store the modified thing as a separate object while retaining the old one.

Lastly, I would echo Andrea M's concern that you might not want to do this at all. Imputing missing data with averages is, on a good day, risky.

Replace only some NA values for selected rows and for only a column in R

df$type[!df$Asked & is.na(df$type)] <- "Replies" gets you to your desired table:

> type <-
+ c(NA, rep("Question",3), NA, NA, rep("Answer",4), rep(NA, 3), rep("Answer",2),
+ NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+ T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
title comments type Asked
1 1 1 <NA> TRUE
2 2 2 Question FALSE
3 3 3 Question FALSE
4 4 4 Question FALSE
5 5 5 Replies FALSE
6 6 6 Replies FALSE
7 7 7 Answer FALSE
8 8 8 Answer FALSE
9 9 9 Answer FALSE
10 10 10 Answer FALSE
11 11 11 <NA> TRUE
12 12 12 Replies FALSE
13 13 13 Replies FALSE
14 14 14 Answer FALSE
15 15 15 Answer FALSE
16 16 16 <NA> TRUE
17 17 17 Question FALSE
18 18 18 Replies FALSE
19 19 19 Answer FALSE
20 20 20 Answer FALSE
21 21 21 <NA> TRUE
22 22 22 Replies FALSE

Replace NA with interpolated value for specific column fields in r

It is specified in the ?na.approx

An object of similar structure as object with NAs replaced by interpolation. For na.approx only the internal NAs are replaced and leading or trailing NAs are omitted if na.rm = TRUE or not replaced if na.rm = FALSE.

By default, the na.approx uses na.rm = TRUE

na.approx(object, x = index(object), xout, ..., na.rm = TRUE, maxgap = Inf, along)


Thus, we can change the code to

my_data[, 42] <- na.approx(my_data[, 42], na.rm = FALSE)

In a large dataset, it is possible to have leading/lagging NAs and using the OP's code results in an output vector with less number of elements as na.rm = TRUE, which triggers the length difference error in replacement



Related Topics



Leave a reply



Submit