Left join only selected columns in R with the merge() function
You can do this by subsetting the data you pass into your merge:
merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)
Or you can simply delete the column after your current merge :)
How to left_join() two datasets but only select specific columns from one of the datasets?
Try this. You can combine select()
with contains()
and in the last function add the tags you want to extract, so there is no need of setting each name individually or by column number. Here the code:
library(dplyr)
#Code
newdf <- left_join(myfruit, fruit_info, by = "fruit_name") %>%
select(contains(c('fruit','number','type')))
Output:
# A tibble: 4 x 4
fruit_name number batch_number type
<chr> <dbl> <dbl> <chr>
1 apple 2 4 gala
2 pear 4 4 conference
3 banana 6 4 cavendish
4 cherry 8 4 bing
Can you left join in R (all.x = TRUE) and keep non-matching cases in their original positions, rather than appending?
library(dplyr)
df1 = data.frame(id = c(4,3,2,1),
color = c('purple','red','blue','green'))
df2 = data.frame(id = c(1,2,4),
weight = c(4.1, 5.3, 1.8))
df1 %>%
left_join(df2,
by = "id")
Result:
id color weight
1 4 purple 1.8
2 3 red NA
3 2 blue 5.3
4 1 green 4.1
R: import and merge only specific columns
Since you only needs those seven variables, you can read in those specific variables using fread
to avoid the issue with the BbAH
variable.
library(data.table)
library(dplyr)
library(purrr)
files <- c("https://www.football-data.co.uk/mmz4281/1920/EC.csv",
"https://www.football-data.co.uk/mmz4281/1819/EC.csv",
"https://www.football-data.co.uk/mmz4281/1718/EC.csv",
"https://www.football-data.co.uk/mmz4281/1617/EC.csv",
"https://www.football-data.co.uk/mmz4281/1516/EC.csv",
"https://www.football-data.co.uk/mmz4281/1415/EC.csv",
"https://www.football-data.co.uk/mmz4281/1314/EC.csv",
"https://www.football-data.co.uk/mmz4281/1213/EC.csv",
"https://www.football-data.co.uk/mmz4281/1112/EC.csv",
"https://www.football-data.co.uk/mmz4281/1011/EC.csv")
# Identify columns you need
myColumns = c("Date","Time","HomeTeam","AwayTeam","FTHG","FTAG","FTR")
# Modified function found in https://stackoverflow.com/a/51348578/8535855
# takes a filename and a vector of columns as input
fread_allfiles <- function(file, columns){
x <- fread(file, select = columns) %>%
select(everything()) #
return(x)
}
df_all <- files %>%
map_df(~ fread_allfiles(.,myColumns))
head(df_all)
which produces the following format:
Date Time HomeTeam AwayTeam FTHG FTAG FTR
1: 03/08/2019 12:30 Stockport Maidenhead 0 1 A
2: 03/08/2019 15:00 Aldershot Fylde 1 2 A
3: 03/08/2019 15:00 Barnet Yeovil 1 0 H
4: 03/08/2019 15:00 Chesterfield Dover Athletic 1 2 A
5: 03/08/2019 15:00 Chorley Bromley 0 0 D
6: 03/08/2019 15:00 Dag and Red Woking 0 2 A
You can then reformat the Date
and Time
columns if needed. It looks like on the first file has any values for Time
? So the rest are filled in as NA
> str(df_all)
Classes ‘data.table’ and 'data.frame': 5429 obs. of 7 variables:
$ Date : chr "03/08/2019" "03/08/2019" "03/08/2019" "03/08/2019" ...
$ Time : chr "12:30" "15:00" "15:00" "15:00" ...
$ HomeTeam: chr "Stockport" "Aldershot" "Barnet" "Chesterfield" ...
$ AwayTeam: chr "Maidenhead" "Fylde" "Yeovil" "Dover Athletic" ...
$ FTHG : int 0 1 1 1 0 0 1 1 2 1 ...
$ FTAG : int 1 2 0 2 0 2 0 4 2 3 ...
$ FTR : chr "A" "A" "H" "A" ...
- attr(*, ".internal.selfref")=<externalptr>
How to join (merge) data frames (inner, outer, left, right)
By using the merge
function and its optional parameters:
Inner join: merge(df1, df2)
will work for these examples because R automatically joins the frames by common variable names, but you would most likely want to specify merge(df1, df2, by = "CustomerId")
to make sure that you were matching on only the fields you desired. You can also use the by.x
and by.y
parameters if the matching variables have different names in the different data frames.
Outer join: merge(x = df1, y = df2, by = "CustomerId", all = TRUE)
Left outer: merge(x = df1, y = df2, by = "CustomerId", all.x = TRUE)
Right outer: merge(x = df1, y = df2, by = "CustomerId", all.y = TRUE)
Cross join: merge(x = df1, y = df2, by = NULL)
Just as with the inner join, you would probably want to explicitly pass "CustomerId" to R as the matching variable. I think it's almost always best to explicitly state the identifiers on which you want to merge; it's safer if the input data.frames change unexpectedly and easier to read later on.
You can merge on multiple columns by giving by
a vector, e.g., by = c("CustomerId", "OrderId")
.
If the column names to merge on are not the same, you can specify, e.g., by.x = "CustomerId_in_df1", by.y = "CustomerId_in_df2"
where CustomerId_in_df1
is the name of the column in the first data frame and CustomerId_in_df2
is the name of the column in the second data frame. (These can also be vectors if you need to merge on multiple columns.)
Join two columns with one column
Get the data in long format so that you can join multiple columns in one left join
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row) %>%
left_join(df2, by = c('value' = 'c')) %>%
pivot_wider(names_from = name, values_from = c(value, d)) %>%
select(-row)
# value_a value_b d_a d_b
# <int> <int> <chr> <chr>
#1 1 6 a f
#2 2 7 b g
#3 3 8 c h
#4 4 9 d i
#5 5 10 e j
Related Topics
Rselenium: Server Signals Port Is Already in Use
Unicode with Knitr and Rmarkdown
How to Create a Bipartite Network in R with Igraph or Tnet
Converting Nested List (Unequal Length) to Data Frame
How to Create Md5 Hash of a Column in R
How to Remove Rows with 0 Values Using R
Ggplot: Colour Points by Groups Based on User Defined Colours
Remove Extra Space and Ring at the Edge of a Polar Plot
How to Strip Dollar Signs ($) from Data/ Escape Special Characters in R
Dealing with Very Small Numbers in R
Can Ggplot2 Control Point Size and Line Size (Lineweight) Separately in One Legend
Round a Posix Date (Posixct) with Base R Functionality
One Function to Detect Nan, Na, Inf, -Inf, etc.