Replace missing values (NA) in one data set with values from another where columns match
I would do this:
library(data.table)
setDT(DF1); setDT(DF2)
DF1[DF2, x := ifelse(is.na(x), i.x, x), on=c("y","z")]
which gives
x y z
1: 153 a 1
2: 163 b 1
3: 184 d 1
4: 123 a 2
5: 145 e 2
6: 176 c 2
7: 124 b 1
8: 199 a 2
Comments. This approach isn't so great, since it merges the whole of DF1
, while we only need to merge the subset where is.na(x)
. Here, the improvement looks like (thanks, @Arun):
DF1[is.na(x), x := DF2[.SD, x, on=c("y", "z")]]
This way is analogous to @RHertel's answer.
From @Jakob's comment:
does this work for more than one x variable? If I want to fill up entire datasets with several columns?
You can enumerate the desired columns:
DF1[DF2, `:=`(
x = ifelse(is.na(x), i.x, x),
w = ifelse(is.na(w), i.w, w)
), on=c("y","z")]
The expression could be constructed using lapply
and substitute
, probably, but if the set of columns is fixed, it might be cleanest just to write it out as above.
Replace NAs in dataframe with values from second dataframe based on multiple criteria
You can create a unique key to update df2
.
unique_key1 <- paste(df1$A, df1$B)
unique_key2 <- paste(df2$A, df2$B)
inds <- is.na(df2$C)
df2$C[inds] <- df1$C[match(unique_key2[inds], unique_key1)]
df2
# A B C E
#1 20210901 15:00 74 A 74
#2 20210903 17:00 27 C 27
#3 20210904 18:00 60 D 60
#4 20210906 20:00 7 F 7
#5 20210907 21:00 96 G 96
#6 20210908 22:00 98 H 98
#7 20210909 23:00 38 I 38
#8 20210910 00:00 89 J 89
#9 20210912 02:00 69 L 69
#10 20210913 03:00 72 M 72
#11 20210914 04:00 76 N 76
#12 20210915 05:00 63 O 63
#13 20210916 06:00 13 P 13
#14 20210918 08:00 25 R 25
#15 20210919 09:00 92 S 92
#16 20210920 10:00 21 T 21
#17 20210921 11:00 79 U 79
#18 20210922 12:00 41 V 41
#19 20210924 14:00 97 X 97
#20 20210925 15:00 16 Y 16
data
cbind
creates a matrix, use data.frame
to create dataframes.
df1 <- data.frame(A, B, C, D)
df2 <- data.frame(A, B, C, E)
Manually replace missing value in a column based on another column
Does this work. Not sure if in your data you'd have NA
for all values of geo == ny
. Hence I've added & is.na(mark)
.
library(dplyr)
df %>% mutate(mark = case_when(geo == 'ny' & is.na(mark) ~ 'toyota', TRUE ~ mark))
# A tibble: 5 x 3
geo mark value
<chr> <chr> <dbl>
1 texas nissan 2
2 texas nissan 78
3 ny toyota 65
4 ny toyota 15
5 ca audi 22
Is there any way to replace a missing value based on another columns' value to match the column name
Here is one way to do this.
The first part is easy to match day
column with the corresponding day.x.time
column. We can do this using matrix subsetting.
cols <- grep('day\\.\\d+\\.time', names(dat))
dat$a <- dat[cols][cbind(1:nrow(dat), dat$day)]
dat
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1 3 2 4 3 3 3 4
#2 NA 5 4 4 10 2 NA
#3 1 3 7 8 1 8 4
#4 4 3 6 6 4 5 5
#5 6 3 5 10 6 7 6
#6 8 3 87 5 8 9 78
#7 NA 1 NA 1 7 10 54
#8 3 5 6 7 9 1 3
#9 2 2 5 2 5 6 3
#10 2 3 9 9 2 4 3
To fill values where day.x.time
column is NA
we can select the closest non-NA value in that row.
inds <- which(is.na(dat$a))
dat$a[inds] <- mapply(function(x, y)
na.omit(unlist(dat[x, cols[order(abs(y- seq_along(cols)))]])[1:4])[1],
inds, dat$day[inds])
dat
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1 3 2 4 3 3 3 4
#2 2 5 4 4 10 2 NA
#3 1 3 7 8 1 8 4
#4 4 3 6 6 4 5 5
#5 6 3 5 10 6 7 6
#6 8 3 87 5 8 9 78
#7 1 1 NA 1 7 10 54
#8 3 5 6 7 9 1 3
#9 2 2 5 2 5 6 3
#10 2 3 9 9 2 4 3
How to replace NA values by matching on ID, with two different data frames
One way to do this using the tidyverse
package
library(tidyverse)
# left join to keep all DF1 and add only matching loc_id in DF2
DF1 %>% left_join(DF2,by="loc_id") %>%
# replace missing lat and long if we have a match in DF2
mutate(lat=ifelse(is.na(lat),loc_lat,lat),
long=ifelse(is.na(long),loc_long,long)) %>%
# remove loc_lat and loc_long columns from dataframe
select(-loc_lat,-loc_long)
Fill missing values of dataframe based on value of matching other column with another dataframe while keeping non-matching values
We can do a left_join
by 'z' and coalesce
the 'val' columns
library(dplyr)
left_join(dat2, dat1, by = 'z') %>%
transmute(z, val = coalesce(val.x, val.y))
# z val
#1 0 15
#2 2 20
#3 8 40
#4 10 50
#5 12 65
#6 15 80
Replace a value NA with the value from another column in R
Perhaps the easiest to read/understand answer in R lexicon is to use ifelse. So borrowing Richard's dataframe we could do:
df <- structure(list(A = c(56L, NA, NA, 67L, NA),
B = c(75L, 45L, 77L, 41L, 65L),
Year = c(1921L, 1921L, 1922L, 1923L, 1923L)),.Names = c("A",
"B", "Year"), class = "data.frame", row.names = c(NA, -5L))
df$A <- ifelse(is.na(df$A), df$B, df$A)
Replace missing data with values from matching rows in another dataframe
You can simply match the IDs in two dataframes and replace them in the original one:
mydf[,c("V3","V4")] <- newdf[match(mydf$ID, newdf$ID),c("V3","V4")]
mydf
# ID V1 V2 V3 V4
# 1 1 value value NA NA
# 2 2 value value 5 4
# 3 3 value value NA NA
# 4 4 value value 8 5
# 5 5 value value NA NA
# 6 6 value value 9 6
Later you can replace NA
s with 0
s.
Update:
Instead of doing a for-loop
concatenate all the newdf
s together and then run the code on that; look at the pseudo-code below:
newdf_concat <- rbind(newdf1, newdf2)
mydf[,c("V3","V4")] <- newdf_concat[match(mydf$ID, newdf_concat$ID),c("V3","V4")]
Python Pandas replace NaN in one column with value from corresponding row of second column
Assuming your DataFrame is in df
:
df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()
First replace any NaN
values with the corresponding value of df.Farheit
. Delete the 'Farheit'
column. Then rename the columns. Here's the resulting DataFrame
:
Related Topics
Possible to Create Latex Multicolumns in Xtable
Ggplot Scale Color Gradient to Range Outside of Data Range
Use Outer Instead of Expand.Grid
How to Group by Two Columns in R
Relocating Alaska and Hawaii on Thematic Map of the Usa with Ggplot2
Add Values to a Reactive Table in Shiny
Change the Position of the Strip Label in Ggplot from the Top to the Bottom
Select Columns Based on Multiple Strings with Dplyr Contains()
Creating (And Accessing) a Sparse Matrix with Na Default Entries
Buffer (Geo)Spatial Points in R with Gbuffer
Categorize Continuous Variable with Dplyr
Specifying Ggplot2 Panel Width
What Is the Benefit of Import in a Namespace in R
R - What Algorithm Does Geom_Density() Use and How to Extract Points/Equation of Curves