Replace Missing Values (Na) in One Data Set with Values from Another Where Columns Match

Replace missing values (NA) in one data set with values from another where columns match

I would do this:

library(data.table)
setDT(DF1); setDT(DF2)

DF1[DF2, x := ifelse(is.na(x), i.x, x), on=c("y","z")]

which gives

     x y z
1: 153 a 1
2: 163 b 1
3: 184 d 1
4: 123 a 2
5: 145 e 2
6: 176 c 2
7: 124 b 1
8: 199 a 2

Comments. This approach isn't so great, since it merges the whole of DF1, while we only need to merge the subset where is.na(x). Here, the improvement looks like (thanks, @Arun):

DF1[is.na(x), x := DF2[.SD, x, on=c("y", "z")]]

This way is analogous to @RHertel's answer.


From @Jakob's comment:

does this work for more than one x variable? If I want to fill up entire datasets with several columns?

You can enumerate the desired columns:

DF1[DF2, `:=`(
x = ifelse(is.na(x), i.x, x),
w = ifelse(is.na(w), i.w, w)
), on=c("y","z")]

The expression could be constructed using lapply and substitute, probably, but if the set of columns is fixed, it might be cleanest just to write it out as above.

Replace NAs in dataframe with values from second dataframe based on multiple criteria

You can create a unique key to update df2.

unique_key1 <- paste(df1$A, df1$B)
unique_key2 <- paste(df2$A, df2$B)
inds <- is.na(df2$C)
df2$C[inds] <- df1$C[match(unique_key2[inds], unique_key1)]
df2

# A B C E
#1 20210901 15:00 74 A 74
#2 20210903 17:00 27 C 27
#3 20210904 18:00 60 D 60
#4 20210906 20:00 7 F 7
#5 20210907 21:00 96 G 96
#6 20210908 22:00 98 H 98
#7 20210909 23:00 38 I 38
#8 20210910 00:00 89 J 89
#9 20210912 02:00 69 L 69
#10 20210913 03:00 72 M 72
#11 20210914 04:00 76 N 76
#12 20210915 05:00 63 O 63
#13 20210916 06:00 13 P 13
#14 20210918 08:00 25 R 25
#15 20210919 09:00 92 S 92
#16 20210920 10:00 21 T 21
#17 20210921 11:00 79 U 79
#18 20210922 12:00 41 V 41
#19 20210924 14:00 97 X 97
#20 20210925 15:00 16 Y 16

data

cbind creates a matrix, use data.frame to create dataframes.

df1 <- data.frame(A, B, C, D)
df2 <- data.frame(A, B, C, E)

Manually replace missing value in a column based on another column

Does this work. Not sure if in your data you'd have NA for all values of geo == ny. Hence I've added & is.na(mark).

library(dplyr)
df %>% mutate(mark = case_when(geo == 'ny' & is.na(mark) ~ 'toyota', TRUE ~ mark))
# A tibble: 5 x 3
geo mark value
<chr> <chr> <dbl>
1 texas nissan 2
2 texas nissan 78
3 ny toyota 65
4 ny toyota 15
5 ca audi 22

Is there any way to replace a missing value based on another columns' value to match the column name

Here is one way to do this.

The first part is easy to match day column with the corresponding day.x.time column. We can do this using matrix subsetting.

cols <- grep('day\\.\\d+\\.time', names(dat))
dat$a <- dat[cols][cbind(1:nrow(dat), dat$day)]
dat
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1 3 2 4 3 3 3 4
#2 NA 5 4 4 10 2 NA
#3 1 3 7 8 1 8 4
#4 4 3 6 6 4 5 5
#5 6 3 5 10 6 7 6
#6 8 3 87 5 8 9 78
#7 NA 1 NA 1 7 10 54
#8 3 5 6 7 9 1 3
#9 2 2 5 2 5 6 3
#10 2 3 9 9 2 4 3

To fill values where day.x.time column is NA we can select the closest non-NA value in that row.

inds <- which(is.na(dat$a))
dat$a[inds] <- mapply(function(x, y)
na.omit(unlist(dat[x, cols[order(abs(y- seq_along(cols)))]])[1:4])[1],
inds, dat$day[inds])

dat
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1 3 2 4 3 3 3 4
#2 2 5 4 4 10 2 NA
#3 1 3 7 8 1 8 4
#4 4 3 6 6 4 5 5
#5 6 3 5 10 6 7 6
#6 8 3 87 5 8 9 78
#7 1 1 NA 1 7 10 54
#8 3 5 6 7 9 1 3
#9 2 2 5 2 5 6 3
#10 2 3 9 9 2 4 3

How to replace NA values by matching on ID, with two different data frames

One way to do this using the tidyverse package

library(tidyverse)
# left join to keep all DF1 and add only matching loc_id in DF2
DF1 %>% left_join(DF2,by="loc_id") %>%
# replace missing lat and long if we have a match in DF2
mutate(lat=ifelse(is.na(lat),loc_lat,lat),
long=ifelse(is.na(long),loc_long,long)) %>%
# remove loc_lat and loc_long columns from dataframe
select(-loc_lat,-loc_long)

Fill missing values of dataframe based on value of matching other column with another dataframe while keeping non-matching values

We can do a left_join by 'z' and coalesce the 'val' columns

library(dplyr)
left_join(dat2, dat1, by = 'z') %>%
transmute(z, val = coalesce(val.x, val.y))
# z val
#1 0 15
#2 2 20
#3 8 40
#4 10 50
#5 12 65
#6 15 80

Replace a value NA with the value from another column in R

Perhaps the easiest to read/understand answer in R lexicon is to use ifelse. So borrowing Richard's dataframe we could do:

df <- structure(list(A = c(56L, NA, NA, 67L, NA),
B = c(75L, 45L, 77L, 41L, 65L),
Year = c(1921L, 1921L, 1922L, 1923L, 1923L)),.Names = c("A",
"B", "Year"), class = "data.frame", row.names = c(NA, -5L))
df$A <- ifelse(is.na(df$A), df$B, df$A)

Replace missing data with values from matching rows in another dataframe

You can simply match the IDs in two dataframes and replace them in the original one:

mydf[,c("V3","V4")] <- newdf[match(mydf$ID, newdf$ID),c("V3","V4")]

mydf
# ID V1 V2 V3 V4
# 1 1 value value NA NA
# 2 2 value value 5 4
# 3 3 value value NA NA
# 4 4 value value 8 5
# 5 5 value value NA NA
# 6 6 value value 9 6

Later you can replace NAs with 0s.

Update:

Instead of doing a for-loop concatenate all the newdfs together and then run the code on that; look at the pseudo-code below:

newdf_concat <- rbind(newdf1, newdf2)

mydf[,c("V3","V4")] <- newdf_concat[match(mydf$ID, newdf_concat$ID),c("V3","V4")]

Python Pandas replace NaN in one column with value from corresponding row of second column

Assuming your DataFrame is in df:

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here's the resulting DataFrame:

resulting DataFrame



Related Topics



Leave a reply



Submit