How do I replace NA values with zeros in an R dataframe?
See my comment in @gsk3 answer. A simple example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
> d[is.na(d)] <- 0
> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5
There's no need to apply apply
. =)
EDIT
You should also take a look at norm
package. It has a lot of nice features for missing data analysis. =)
Replace NA values by - in R
In base R
, we may do
output[is.na(output)] <- "-"
-output
> output
date ABC CDE FGH SUM
1 2021-06-30 4 1 6 11
2 2021-07-02 1 - - 1
Replace NA values in data frame with the column mean
library(tidyverse)
df1 <- tibble(x = seq(3), y = c(1, NA, 2))
df1 %>% mutate(y = y %>% replace_na(mean(df1$y, na.rm = TRUE)))
#> # A tibble: 3 × 2
#> x y
#> <int> <dbl>
#> 1 1 1
#> 2 2 1.5
#> 3 3 2
Created on 2022-03-10 by the reprex package (v2.0.0)
How to only replace NA with specific values based on a condition in another variable
Changed the vector to have an %in% statement and added an else statement.
d %>%
mutate(Udd = case_when(is.na(Udd) & Edu < 8 ~ 1,
is.na(Udd) & Edu %in% c(8:11) ~ 2,
is.na(Udd) & Edu > 11 ~ 3,
TRUE ~ Udd))
Replace NA in a dataframe, keeping the column value distribution
In base R you could do:
set.seed(5)
data.frame(lapply(df,\(x)replace(x,is.na(x),sample(na.omit(x),sum(is.na(x))))))
Person_ID Var1 Var2
1 A 1 3
2 B 1 2
3 C 2 1
4 D 1 4
5 E 1 3
6 F 1 1
7 G 1 3
8 H 1 1
9 I 2 2
10 J 1 1
11 K 1 3
12 L 2 4
How to replace all NA values in numerical columns only with median values and update the dataframe
Based on your screenshots, it looks like you're just going back to the RStudio viewer window to look at the data frame again. If so, the issue is this:
When you write test2 %>% mutate_if(...)
, you're telling R to change something in test2
and return the result (roughly meaning, in this context, to just print the result and show it to you). What you're not telling it to do is to save that result anywhere.
You would want something like test2 <- test2 %>% mutate_if(...)
to overwrite the existing test2
data frame in your global environment, or something like test3 <- test2 %>% mutate_if(...)
to give it a new name and store the modified thing as a separate object while retaining the old one.
Lastly, I would echo Andrea M's concern that you might not want to do this at all. Imputing missing data with averages is, on a good day, risky.
Replace only some NA values for selected rows and for only a column in R
df$type[!df$Asked & is.na(df$type)] <- "Replies"
gets you to your desired table:
> type <-
+ c(NA, rep("Question",3), NA, NA, rep("Answer",4), rep(NA, 3), rep("Answer",2),
+ NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+ T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
title comments type Asked
1 1 1 <NA> TRUE
2 2 2 Question FALSE
3 3 3 Question FALSE
4 4 4 Question FALSE
5 5 5 Replies FALSE
6 6 6 Replies FALSE
7 7 7 Answer FALSE
8 8 8 Answer FALSE
9 9 9 Answer FALSE
10 10 10 Answer FALSE
11 11 11 <NA> TRUE
12 12 12 Replies FALSE
13 13 13 Replies FALSE
14 14 14 Answer FALSE
15 15 15 Answer FALSE
16 16 16 <NA> TRUE
17 17 17 Question FALSE
18 18 18 Replies FALSE
19 19 19 Answer FALSE
20 20 20 Answer FALSE
21 21 21 <NA> TRUE
22 22 22 Replies FALSE
Replace NA with interpolated value for specific column fields in r
It is specified in the ?na.approx
An object of similar structure as object with NAs replaced by interpolation. For na.approx only the internal NAs are replaced and leading or trailing NAs are omitted if na.rm = TRUE or not replaced if na.rm = FALSE.
By default, the na.approx
uses na.rm = TRUE
na.approx(object, x = index(object), xout, ..., na.rm = TRUE, maxgap = Inf, along)
Thus, we can change the code to
my_data[, 42] <- na.approx(my_data[, 42], na.rm = FALSE)
In a large dataset, it is possible to have leading/lagging NAs and using the OP's code results in an output vector with less number of elements as na.rm = TRUE
, which triggers the length difference error in replacement
Related Topics
Use 'J' to Select the Join Column of 'X' and All Its Non-Join Columns
Interleave Columns of Two Data Frames
Programmatically Create Tab and Plot in Markdown
How to Embed Plots into a Tab in Rmarkdown in a Procedural Fashion
R: Using "Microbenchmark" and Ggplot2 to Plot Runtimes
Using Jupyter R Kernel with Visual Studio Code
Character "|" in Strsplit Function (Vertical Bar/Pipe)
R: Read in Random Rows from File Using Fread or Equivalent
Transform One Column from Categoric to Binary, Keep the Rest
Object 'C_Stri_Join' Not Found - Using Knitr in Rstudio
Do I Need to Reshape This Wide Data to Effectively Use Ggplot2
Ggplot Line Plot Different Colors for Sections
Find Closest Points (Lat/Lon) from One Data Set to a Second Data Set
R Packages Fail to Compile with Gcc
How to Unlock Environment in R
R: Get the Min/Max of Each Item of a Vector Compared to Single Value