How to join (merge) data frames (inner, outer, left, right)
By using the merge
function and its optional parameters:
Inner join: merge(df1, df2)
will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId")
to make sure that you were matching on only the fields you desired. You can also use the by.x
and by.y
parameters if the matching variables have different names in the different data frames.
Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
Cross join: merge(x = df1, y = df2, by = NULL)
Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.
You can merge on multiple columns by giving by
a vector, e.g., by = c("CustomerId", "OrderId")
.
If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2"
where CustomerId_in_df1
is the name of the column in the first data frame and CustomerId_in_df2
is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)
Merging two dataframes by keeping certain column values in r
We may use rows_update
library(dplyr)
rows_update(df2, df1, by = c("id", "item", "score"))
-output
id item score cat.a cat.b
1 1 11 1 A a
2 2 22 0 B a
3 3 33 1 C b
4 4 44 1 D b
5 5 55 1 E c
6 6 66 0 F f
7 7 77 1 <NA> <NA>
8 8 88 1 <NA> <NA>
R merge two dataframes with same columns without replacing values
Solution:
Thanks to @GregorThomas for providing the answer.
This problem was solved with the following command:
merge(data1, data2, all = TRUE)
Use dplyr package to merge two dataframes into one but add values to specific columns of the merged dataset
Is this the result that you are looking for?
You can use dplyr::bind_rows
in combination with tidyr::fill
to get it. The column names need some cleaning up though.
library(dplyr)
# Clean up column names to add second dataset to the first using rename() to remove the numbers
original2 %>%
rename(id = id1,
type = type1,
city = city1,
state = state1,
zip = zip1) %>%
# Add dataset 1 and cleaned up dataset 2 together
bind_rows(original_1, .) %>%
# Fill NAs with data from dataset 2 using tidyr::fill()
tidyr::fill(data_1, .direction = "up") %>%
tidyr::fill(data_2, .direction = "up") %>%
# Remove "type" column
select(-type) %>%
# Artificially replace values "data1" and "data2" in "data" to row 9 and 11 respectively
mutate(data = case_when(row_number() == 9 ~ "data1",
row_number() == 11 ~ "data2",
TRUE ~ NA_character_)) %>%
# Remove rows that do not contain a value for "id"
filter(! is.na(id))
# A tibble: 20 x 7
id city state zip data data_1 data_2
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 city1 state1 zip1 NA Non_changing_data Non_changing_data
2 2 city2 state2 zip2 NA Non_changing_data Non_changing_data
3 3 city3 state3 zip3 NA Non_changing_data Non_changing_data
4 4 city4 state4 zip4 NA Non_changing_data Non_changing_data
5 5 city5 state5 zip5 NA Non_changing_data Non_changing_data
6 6 city6 state6 zip6 NA Non_changing_data Non_changing_data
7 7 city7 state7 zip7 NA Non_changing_data Non_changing_data
8 8 city8 state8 zip8 NA Non_changing_data Non_changing_data
9 9 city9 state9 zip9 data1 Non_changing_data Non_changing_data
10 10 city10 state10 zip10 NA Non_changing_data Non_changing_data
11 11 city11 state11 zip11 data2 Non_changing_data Non_changing_data
12 12 city12 state12 zip12 NA Non_changing_data Non_changing_data
13 13 city13 state13 zip13 NA Non_changing_data Non_changing_data
14 14 city14 state14 zip14 NA Non_changing_data Non_changing_data
15 15 city15 state15 zip15 NA Non_changing_data Non_changing_data
16 16 city16 state16 zip16 NA Non_changing_data Non_changing_data
17 17 city17 state17 zip17 NA Non_changing_data Non_changing_data
18 18 city18 state18 zip18 NA Non_changing_data Non_changing_data
19 19 city19 state19 zip19 NA Non_changing_data Non_changing_data
20 20 city20 state20 zip20 NA Non_changing_data Non_changing_data
Merging two DataFrames matching rows/columns
You can subset y
with dimensions of x
and assign -
y[1:nrow(x), 1:ncol(x)] <- x
y
Merging two dataframes by multiple columns without losing data
merge
has an argument all
that specifies if you want to keep all rows from left and right side (i.e. all rows from x and all rows from y)
total <- merge(df1,df2,by=c("id","year"), all=TRUE)
How to merge two dataframes with two matching columns in R
We could use left_join
library(dplyr)
df1 %>%
left_join(df2, by = c("year","companyID"))
Output:
year companyID salary Turnover
<dbl> <dbl> <dbl> <dbl>
1 2009 1 1000 10000
2 2009 2 2000 20000
3 2010 1 1200 12000
4 2010 2 2200 22000
5 2011 3 1500 15000
6 2012 4 1100 NA
How can I merge two dataframes together with some conditional requirements?
Does this work for you?
library(dplyr)
library(data.table)
merge(x = df1,
y = df2) %>%
filter(TestDate %between% list(Date1, Date2))
How to merge two dataframes specifying specific columns? (R)
We can use dplyr::left_join
to merge df1
with a version of df2
that contains only "ID" and "var3". Then mutate
the "var" columns to replace NA (missing) values with 0.
df3 <- df1 %>%
left_join(select(df2, ID, var3), by = 'ID') %>%
mutate(across(-ID, ~replace_na(., 0)))
ID var1 var2 var3
<dbl> <dbl> <dbl> <dbl>
1 1001 1 0 1
2 1002 0 1 1
3 1003 1 1 0
4 1004 0 0 0
There are several valid ways to select the "var" columns within across
. Here I've used -ID
. One could also use starts_with('var')
or even everything()
, though the latter assumes no NA values in "ID".
Related Topics
How to Plot a Stacked Bar with Ggplot
Plot Margins in Rmarkdown/Knitr
Converting Date Column in Data Frame
How to Run a Function Every Second
Predict.Svm Does Not Predict New Data
Find and Replace Missing Values with Row Mean
Data.Table Join and J-Expression Unexpected Behavior
R Find the Distance Between Two Us Zipcode Columns
Shiny - How to Change the Font Size in Select Tags
Grouped Correlation with Dplyr (Works Only on Console)
How to Convert a Hex String to Text in R
Assign Names to Vector Entries Without Assigning the Vector a Variable Name
Scale Back Linear Regression Coefficients in R from Scaled and Centered Data
How to Define the Version of a Package in R Install.Packages
Dplyr: Grouping and Summarizing/Mutating Data with Rolling Time Windows
Fitting Logarithmic Curve in R
Plot Emojis/Emoticons in R with Ggplot
How to Draw a Contour Plot When Data Are Not on a Regular Grid