R: Updating a Data Frame with Another Data Frame

R: Updating a data frame with another data frame

merge then aggregate:

aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )

# Index B C A
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 5 4
#5 5 4 5 5
#6 6 4 5 6

Or in dplyr speak:

df1 %>% 
full_join(df2) %>%
group_by(Index) %>%
summarise_each(funs(na.omit))

#Joining by: c("Index", "B", "C")
#Source: local data frame [6 x 4]
#
# Index A B C
# (dbl) (int) (dbl) (dbl)
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 4 5
#5 5 5 4 5
#6 6 6 4 5

How to update dataframe column using information from another dataframe

You may use match in base R -

df1$Sex[match(df2$Bird_ID, df1$Bird_ID)] <- df2$Seen_sex
df1

# Bird_ID Sex
#1 1 Male
#2 2 Female
#3 3 Male
#4 4 Male
#5 5 Male
#6 6 Female

Updating dataframe column value by referring to another dataframe

You can use %in% to count number of days in n1 between each EntryDate and ExitDate.

df$dayCount <- colSums(mapply(function(x, y) n1 %in% seq(x, y, by = '1 day'), 
df$EntryDate, df$ExitDate))

df
# EntryDate ExitDate dayCount
#22 2001-02-02 2001-02-07 4
#65 2001-04-06 2001-04-10 3
#76 2001-04-24 2001-04-26 3
#84 2001-05-07 2001-05-15 7
#135 2001-07-17 2001-07-19 3
#138 2001-07-20 2001-07-23 2
#155 2001-08-14 2001-08-20 4
#204 2001-10-25 2001-10-30 3
#305 2002-03-22 2002-03-26 2
#307 2002-03-27 2002-04-01 3

Modify data.frame column with data and condition from another data.frame

You can use a combination of left_join and mutate from dplyr

Edit

library(dplyr)

df3 <- df %>%
left_join(df2, by = "addr") %>%
mutate(num = ifelse(.$num.y %in% df2$num, .$num.y, df$num)) %>%
select(addr, num)

df3
# addr num
#1 a 100
#2 b 200
#3 c 3
#4 d 500

old answer

 df3 <- df %>% 
mutate(num = ifelse(addr %in% df2$addr, df2$num, num))

df3
# addr num
#1 a 100
#2 b 200
#3 c 3
#4 d 100

Updating rows of dataframe with other dataframe column vlaue for each group in R

Here's one approach. First, I make a version of df2 with the dates stored as dates, which'll make it simpler to use them for calculations, and call that date_limits. (It's not strictly necessary here since your date strings' alphabetical sorting will also be chronological, but I think it's good practice.) I don't need the x/y values since they're in df1 already.

library(tidyverse); library(lubridate)
date_limits <- df2 %>%
mutate(max_date = ymd(date)) %>%
select(max_date, location, location_id)

Then we can join those dates onto df1 using dplyr::left_join, sort of like vlookup in excel, or merge in base R. It will by default use all the common variables (in this case location and location_id) to bring in the max_date for that location.

Then I change y and x using mutate(across(... so that if the max_date we pulled in is later than the date, change it to NA, otherwise leave it as is.

df1 %>% 
mutate(date = ymd(date)) %>%
left_join(date_limits) %>%
mutate(across(y:x, ~if_else(date > max_date, NA_integer_, .)))

Result

Joining, by = c("location", "location_id")
date location location_id y x max_date
1 2022-02-02 A 1 NA NA 2022-02-01
2 2022-02-02 B 2 45 67 2022-02-02
3 2022-02-02 C 3 NA NA 2022-01-30
4 2022-02-02 D 4 NA NA 2022-01-31
5 2022-02-01 A 1 37 67 2022-02-01
6 2022-02-01 B 2 82 23 2022-02-02
7 2022-02-01 C 3 NA NA 2022-01-30
8 2022-02-01 D 4 NA NA 2022-01-31
9 2022-01-31 A 1 61 37 2022-02-01
10 2022-01-31 B 2 90 65 2022-02-02
11 2022-01-31 C 3 NA NA 2022-01-30
12 2022-01-31 D 4 12 23 2022-01-31
13 2022-01-30 A 1 38 48 2022-02-01
14 2022-01-30 B 2 57 53 2022-02-02
15 2022-01-30 C 3 75 95 2022-01-30
16 2022-01-30 D 4 76 19 2022-01-31

updating column in one dataframe with value from another dataframe based on matching values

you don't need z$color in the first place if its just place holder, you can replace NA later with 0

z$color<-y[match(z$letter, y$letter),2]


Related Topics



Leave a reply



Submit