Merge Multiple Data.Frames in R with Varying Row Length

Merge dataframes, different lengths

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
variable ID value concreteness familiarity typicality
1 amoeba 1 0 3.60 1.30 1.71
2 amoeba 2 0 3.60 1.30 1.71
3 amoeba 3 NA 3.60 1.30 1.71
4 bacterium 1 0 3.82 3.48 2.13
5 bacterium 2 0 3.82 3.48 2.13
6 bacterium 3 0 3.82 3.48 2.13
7 leech 1 1 5.71 1.83 4.50
8 leech 2 1 5.71 1.83 4.50
9 leech 3 0 5.71 1.83 4.50

Merge multiple data.frames in R with varying row length

You want to merge the result with df3, i.e.:

merge(df3, merge(df1, df2, by="Year", all.x=TRUE, all.y=TRUE), by = "Year", all.x = TRUE, all.y = TRUE)
# Year Site3 Site1 Site2
#1 2006 <NA> 2.3 <NA>
#2 2007 <NA> 1 2.7
#3 2008 1.3 3.1 4.1
#4 2009 2 2.9 1.1
#5 2010 3.6 1.4 2.6
#6 2011 1.7 3 3.1

Or if you have your data.frame's in a list, use Reduce to generalize the above:

Reduce(function(x,y) merge(x, y, by = "Year", all.x = TRUE, all.y = TRUE),
list(df1, df2, df3))
# Year Site1 Site2 Site3
#1 2006 2.3 <NA> <NA>
#2 2007 1 2.7 <NA>
#3 2008 3.1 4.1 1.3
#4 2009 2.9 1.1 2
#5 2010 1.4 2.6 3.6
#6 2011 3 3.1 1.7

How can I combine two dataframes with different lengths in R?

Try using left_join in the dplyr package.

library(dplyr)

# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))

# use left_join
df3 <- left_join(df1, df2, by = "id")

# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0

how to combine two data frames of different lengths?

This is too long for a comment, but really just need to demonstrate that the solution I gave in comments does work. If you are having problems with getting merge to work, then there must be some other issue with your data, which we cannot diagnose because you did not provide a dput of your data.frames

df1 = read.table(text = 
"Date Duration
6/27/2014 10.00
6/30/2014 20.00
7/11/2014 15.00",
header = T)

df2 = read.table(text =
"Date Percent_Removal
6/27/2014 20.39
6/30/2014 27.01
7/7/2014 49.84
7/11/2014 59.48
7/17/2014 99.04",
header = T)

df1$Date <- as.Date (df1$Date, format= "%m/%d/%Y")
df2$Date <- as.Date (df2$Date, format= "%m/%d/%Y")

df3 = merge(df1,df2)
# Date Duration Percent_Removal
# 1 2014-06-27 10 20.39
# 2 2014-06-30 20 27.01
# 3 2014-07-11 15 59.48

Note that no additional options need to be specified in the merge statement because

  1. The default value by = is the column names that are common to both data frames. In this case, only Date is shared.
  2. the default values of all.x, all.y and all give the desired behaviour where only the rows that are in both data frames are kept.

Merge many R data frames by row.names with differing lengths

We could get all the datasets into a list and use merge with Reduce specifying the by as a new column created from the row names

lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), \(x) 
transform(x, rn =row.names(x)))
out <- Reduce(function(...) merge(..., by = 'rn', all = TRUE),
lst1)
row.names(out) <- out[[1]]
out <- out[-1]

-output

 out
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70

Or using tidyverse with full_join after creating a row names column with rownames_to_column (from tibble)

library(dplyr)
library(tibble)
library(purrr)
mget(ls(pattern = '^df\\d+$')) %>%
map(~ .x %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames("rn")
v1 v2 v3
chr1 10 6 NA
chr2 43 64 20
chr3 1 NA 30
chr4 44 21 40
chr5 598 98 50
chr6 NA 10 60
chr7 NA 20 70

Merging data frames of different row length in R

You can achieve the desired result by using merge:

merge(df.A,df.B,by='Category',all=T)

which will produce the following output:

#  Category Number.x Number.y
#1 A 1 5
#2 B 2 6
#3 C 3 7
#4 D 4 NA

Merging data frames with different number of rows and different columns

If A and B are the two input data frames, here are some solutions:

1) merge This solutions works regardless of whether A or B has more rows.

merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL), 
by = 0, all = TRUE)[-1]

The first two arguments could be replaced with just A and B respectively if A and B have default rownames, i.e. 1, 2, ..., or if they have consistent rownames. That is, merge(A, B, by = 0, all = TRUE)[-1] .

For example, if we have this input:

# test inputs
A <- data.frame(BOD, row.names = letters[1:6])
B <- setNames(2 * BOD[1:2, ], c("X", "Y"))

then:

merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL), 
by = 0, all = TRUE)[-1]

gives:

  Time demand  X    Y
1 1 8.3 2 16.6
2 2 10.3 4 20.6
3 3 19.0 NA NA
4 4 16.0 NA NA
5 5 15.6 NA NA
6 7 19.8 NA NA

1a) An equivalent variation is:

do.call("merge", c(lapply(list(A, B), data.frame, row.names=NULL), 
by = 0, all = TRUE))[-1]

2) cbind.zoo This solution assumes that A has more rows and that B's entries are all of the same type, e.g. all numeric. A is not restricted. These conditions hold in the data of the question.

library(zoo)
data.frame(A, cbind(zoo(, 1:nrow(A)), as.zoo(B)))


Related Topics



Leave a reply



Submit