Merging Two Dataframes in R

How to join (merge) data frames (inner, outer, left, right)

By using the merge function and its optional parameters:

Inner join: merge(df1, df2) will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId") to make sure that you were matching on only the fields you desired. You can also use the by.x and by.y parameters if the matching variables have different names in the different data frames.

Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)

Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)

Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)

Cross join: merge(x = df1, y = df2, by = NULL)

Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.

You can merge on multiple columns by giving by a vector, e.g., by = c("CustomerId", "OrderId").

If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2" where CustomerId_in_df1 is the name of the column in the first data frame and CustomerId_in_df2 is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)

Merging two dataframes by keeping certain column values in r

We may use rows_update

library(dplyr)
rows_update(df2, df1, by = c("id", "item", "score"))

-output

  id item score cat.a cat.b
1 1 11 1 A a
2 2 22 0 B a
3 3 33 1 C b
4 4 44 1 D b
5 5 55 1 E c
6 6 66 0 F f
7 7 77 1 <NA> <NA>
8 8 88 1 <NA> <NA>

R merge two dataframes with same columns without replacing values

Solution:

Thanks to @GregorThomas for providing the answer.

This problem was solved with the following command:

merge(data1, data2, all = TRUE)

Use dplyr package to merge two dataframes into one but add values to specific columns of the merged dataset

Is this the result that you are looking for?

You can use dplyr::bind_rows in combination with tidyr::fill to get it. The column names need some cleaning up though.

library(dplyr)

# Clean up column names to add second dataset to the first using rename() to remove the numbers
original2 %>%
rename(id = id1,
type = type1,
city = city1,
state = state1,
zip = zip1) %>%
# Add dataset 1 and cleaned up dataset 2 together
bind_rows(original_1, .) %>%
# Fill NAs with data from dataset 2 using tidyr::fill()
tidyr::fill(data_1, .direction = "up") %>%
tidyr::fill(data_2, .direction = "up") %>%
# Remove "type" column
select(-type) %>%
# Artificially replace values "data1" and "data2" in "data" to row 9 and 11 respectively
mutate(data = case_when(row_number() == 9 ~ "data1",
row_number() == 11 ~ "data2",
TRUE ~ NA_character_)) %>%
# Remove rows that do not contain a value for "id"
filter(! is.na(id))

# A tibble: 20 x 7
id city state zip data data_1 data_2
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 city1 state1 zip1 NA Non_changing_data Non_changing_data
2 2 city2 state2 zip2 NA Non_changing_data Non_changing_data
3 3 city3 state3 zip3 NA Non_changing_data Non_changing_data
4 4 city4 state4 zip4 NA Non_changing_data Non_changing_data
5 5 city5 state5 zip5 NA Non_changing_data Non_changing_data
6 6 city6 state6 zip6 NA Non_changing_data Non_changing_data
7 7 city7 state7 zip7 NA Non_changing_data Non_changing_data
8 8 city8 state8 zip8 NA Non_changing_data Non_changing_data
9 9 city9 state9 zip9 data1 Non_changing_data Non_changing_data
10 10 city10 state10 zip10 NA Non_changing_data Non_changing_data
11 11 city11 state11 zip11 data2 Non_changing_data Non_changing_data
12 12 city12 state12 zip12 NA Non_changing_data Non_changing_data
13 13 city13 state13 zip13 NA Non_changing_data Non_changing_data
14 14 city14 state14 zip14 NA Non_changing_data Non_changing_data
15 15 city15 state15 zip15 NA Non_changing_data Non_changing_data
16 16 city16 state16 zip16 NA Non_changing_data Non_changing_data
17 17 city17 state17 zip17 NA Non_changing_data Non_changing_data
18 18 city18 state18 zip18 NA Non_changing_data Non_changing_data
19 19 city19 state19 zip19 NA Non_changing_data Non_changing_data
20 20 city20 state20 zip20 NA Non_changing_data Non_changing_data

Merging two DataFrames matching rows/columns

You can subset y with dimensions of x and assign -

y[1:nrow(x), 1:ncol(x)] <- x
y

Merging two dataframes by multiple columns without losing data

merge has an argument all that specifies if you want to keep all rows from left and right side (i.e. all rows from x and all rows from y)

 total <- merge(df1,df2,by=c("id","year"), all=TRUE)

How to merge two dataframes with two matching columns in R

We could use left_join

library(dplyr)
df1 %>%
left_join(df2, by = c("year","companyID"))

Output:

   year companyID salary Turnover
<dbl> <dbl> <dbl> <dbl>
1 2009 1 1000 10000
2 2009 2 2000 20000
3 2010 1 1200 12000
4 2010 2 2200 22000
5 2011 3 1500 15000
6 2012 4 1100 NA

How can I merge two dataframes together with some conditional requirements?

Does this work for you?

library(dplyr)
library(data.table)
merge(x = df1,
y = df2) %>%
filter(TestDate %between% list(Date1, Date2))

How to merge two dataframes specifying specific columns? (R)

We can use dplyr::left_join to merge df1 with a version of df2 that contains only "ID" and "var3". Then mutate the "var" columns to replace NA (missing) values with 0.

df3 <- df1 %>% 
left_join(select(df2, ID, var3), by = 'ID') %>%
mutate(across(-ID, ~replace_na(., 0)))

ID var1 var2 var3
<dbl> <dbl> <dbl> <dbl>
1 1001 1 0 1
2 1002 0 1 1
3 1003 1 1 0
4 1004 0 0 0

There are several valid ways to select the "var" columns within across. Here I've used -ID. One could also use starts_with('var') or even everything(), though the latter assumes no NA values in "ID".



Related Topics



Leave a reply



Submit