R: Adding Nas into Data Frame

R: Adding NAs into Data Frame

You can use the reshape2 package:

# make sample data frame
df <- read.table(text = "Name Position Value
a 1 0.2
a 3 0.4
a 4 0.3
b 1 0.5
b 2 0.4
b 5 0.3
c 2 0.3
c 3 0.4
c 5 0.1
d 1 0.2
d 2 0.4
d 3 0.5", header = TRUE, stringsAsFactors = FALSE)

library('reshape2')
df2 <- dcast(df, Name ~ Position)
df3 <- melt(df2, value.name = "Value", variable.name = "Position")
df3[order(df3$Name), ]
# Name Position Value
# 1 a 1 0.2
# 5 a 2 NA
# 9 a 3 0.4
# 13 a 4 0.3
# 17 a 5 NA
# 2 b 1 0.5
# 6 b 2 0.4
# 10 b 3 NA
# 14 b 4 NA
# 18 b 5 0.3
# 3 c 1 NA
# 7 c 2 0.3
# 11 c 3 0.4
# 15 c 4 NA
# 19 c 5 0.1
# 4 d 1 0.2
# 8 d 2 0.4
# 12 d 3 0.5
# 16 d 4 NA
# 20 d 5 NA

Assigning NAs into a dataframe in R

We can first copy Value column to Desired_output column and find out the indices (inds) where Value is greater than 1 and add NA to that row and next two rows as well.

A$Desired_output <- A$Value
inds <- which(A$Value > 1)
A$Desired_output[unique(c(inds, inds + 1, inds + 2))] <- NA
A

# Event Value Desired_output
#1 1 5 NA
#2 2 3 NA
#3 2 0 NA
#4 2 0 NA
#5 2 0 0
#6 2 2 NA
#7 2 0 NA
#8 3 1 NA
#9 3 10 NA
#10 4 0 NA
#11 4 0 NA
#12 4 NA NA
#13 4 NA NA
#14 4 NA NA
#15 5 1 1
#16 6 0 0
#17 6 8 NA
#18 6 0 NA
#19 7 0 NA

How can I add several NA's as rows to data frame?

Similar to GKi's solution but without rbind():

df[c(rep(NA, 4L), seq_len(nrow(df))), ]
# X1.3 X4.6 X7.9
# NA NA NA NA
# NA.1 NA NA NA
# NA.2 NA NA NA
# NA.3 NA NA NA
# 1 1 4 7
# 2 2 5 8
# 3 3 6 9

Replace NAs in dataframe with values from second dataframe based on multiple criteria

You can create a unique key to update df2.

unique_key1 <- paste(df1$A, df1$B)
unique_key2 <- paste(df2$A, df2$B)
inds <- is.na(df2$C)
df2$C[inds] <- df1$C[match(unique_key2[inds], unique_key1)]
df2

# A B C E
#1 20210901 15:00 74 A 74
#2 20210903 17:00 27 C 27
#3 20210904 18:00 60 D 60
#4 20210906 20:00 7 F 7
#5 20210907 21:00 96 G 96
#6 20210908 22:00 98 H 98
#7 20210909 23:00 38 I 38
#8 20210910 00:00 89 J 89
#9 20210912 02:00 69 L 69
#10 20210913 03:00 72 M 72
#11 20210914 04:00 76 N 76
#12 20210915 05:00 63 O 63
#13 20210916 06:00 13 P 13
#14 20210918 08:00 25 R 25
#15 20210919 09:00 92 S 92
#16 20210920 10:00 21 T 21
#17 20210921 11:00 79 U 79
#18 20210922 12:00 41 V 41
#19 20210924 14:00 97 X 97
#20 20210925 15:00 16 Y 16

data

cbind creates a matrix, use data.frame to create dataframes.

df1 <- data.frame(A, B, C, D)
df2 <- data.frame(A, B, C, E)

How do I add random `NA`s into a data frame

Return x within your function:

> df <- apply (df, 2, function(x) {x[sample( c(1:n), floor(n/10))] <- NA; x} )
> tail(df)
id age sex
[45,] "45" "41" NA
[46,] "46" NA "f"
[47,] "47" "38" "f"
[48,] "48" "32" "f"
[49,] "49" "53" NA
[50,] "50" "74" "f"

Randomly insert NAs into dataframe proportionaly

df <- data.frame(A = 1:10, B = 11:20, c = 21:30)
head(df)
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 15 25
## 6 6 16 26

as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
## A B c
## 1 1 11 21
## 2 2 12 22
## 3 3 13 23
## 4 4 14 24
## 5 5 NA 25
## 6 6 16 26
## 7 NA 17 27
## 8 8 18 28
## 9 9 19 29
## 10 10 20 30

It's a random process, so it might not give 15% every time.

Filling data across a data frame when there are NAs that should be left NAs

Here's a way with dplyr -

df %>% 
mutate_at(-1, ~replace(., is.na(.) & cumsum(!is.na(.)) > 0, 0))

dates item_1 item_2 item_3 item_4
1 2019-01-01 NA NA NA NA
2 2019-01-02 1 NA NA NA
3 2019-01-03 2 NA NA 1
4 2019-01-04 3 NA 8 2
5 2019-01-05 4 1 9 3
6 2019-01-06 5 2 10 4
7 2019-01-07 6 3 11 5
8 2019-01-08 7 0 0 6
9 2019-01-09 0 0 0 0
10 2019-01-10 8 9 2 0

A slightly shorter version of replace condition, thanks to @Frank: is.na(.) & cummax(!is.na(.))



Related Topics



Leave a reply



Submit