Merge Dataframes, Different Lengths

Merge dataframes, different lengths

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
variable ID value concreteness familiarity typicality
1 amoeba 1 0 3.60 1.30 1.71
2 amoeba 2 0 3.60 1.30 1.71
3 amoeba 3 NA 3.60 1.30 1.71
4 bacterium 1 0 3.82 3.48 2.13
5 bacterium 2 0 3.82 3.48 2.13
6 bacterium 3 0 3.82 3.48 2.13
7 leech 1 1 5.71 1.83 4.50
8 leech 2 1 5.71 1.83 4.50
9 leech 3 0 5.71 1.83 4.50

How to merge two dataframes with different lengths in python

I assume your output should look somewhat like this:





































WeekCoeff1Coeff2
1-0.4566620.571707
1-0.5337740.086152
1-0.4328710.824832
233
2NaN3

Merge dataframe with different lengths

offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
concat_df = pd.concat([sub_df1, sub_df2], axis=1)
concat_df["OFFERING_ID"] = id
offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)

That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().

How can I combine two dataframes with different lengths in R?

Try using left_join in the dplyr package.

library(dplyr)

# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))

# use left_join
df3 <- left_join(df1, df2, by = "id")

# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0

how to combine two data frames of different lengths?

This is too long for a comment, but really just need to demonstrate that the solution I gave in comments does work. If you are having problems with getting merge to work, then there must be some other issue with your data, which we cannot diagnose because you did not provide a dput of your data.frames

df1 = read.table(text = 
"Date Duration
6/27/2014 10.00
6/30/2014 20.00
7/11/2014 15.00",
header = T)

df2 = read.table(text =
"Date Percent_Removal
6/27/2014 20.39
6/30/2014 27.01
7/7/2014 49.84
7/11/2014 59.48
7/17/2014 99.04",
header = T)

df1$Date <- as.Date (df1$Date, format= "%m/%d/%Y")
df2$Date <- as.Date (df2$Date, format= "%m/%d/%Y")

df3 = merge(df1,df2)
# Date Duration Percent_Removal
# 1 2014-06-27 10 20.39
# 2 2014-06-30 20 27.01
# 3 2014-07-11 15 59.48

Note that no additional options need to be specified in the merge statement because

  1. The default value by = is the column names that are common to both data frames. In this case, only Date is shared.
  2. the default values of all.x, all.y and all give the desired behaviour where only the rows that are in both data frames are kept.

Merging two dataframes of different length, on a particular column with different number of instances

Option 1: join

This solution requires that you set the index of D2 and use the on parameter

D1.join(D2.set_index('ID'), on='ID')

ID val1 val2 Target
0 1 x y 0
1 1 x y 0
2 2 a b 1
3 2 a c 1

Note: if D2 doesn't include all values in D1.ID and you want a null value for the rows of D1 where that is true, then use the how='left' option.

D1.join(D2.set_index('ID'), on='ID', how='left')

from comments:

Why does this require setting the index of D2? The other answers don't do that. – ErikE

@ErikE that is the difference between merge and join. pandas.DataFrame.merge will perform its merging on column values by default. While join looks at the index by default. I can override joins behavior by specifying a column to join on with on='ID'. However, that override ability is limited to the left object only. So, I have to set the index of the right object in order to execute appropriately. – piRSquared

Option 2: map + assign

This solution is going to turn D2 into something dict like, a pd.Series with the index being the 'ID's and the values being the 'Target'. map converts the 'ID' column on D1 into new values and we assign it to a new column with assign.

D1.assign(Target=D1.ID.map(D2.set_index('ID').Target))

ID val1 val2 Target
0 1 x y 0
1 1 x y 0
2 2 a b 1
3 2 a c 1

Merging two dataframes with different lengths

pandas.DataFrame.join can "join" dataframes based on overlap in column data (or index). Something like this will likely work for you:

df1.join(df2.set_index('block_id'), on='block_id')

Merge two python pandas data frames of different length but keep all rows in output data frame

You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:

df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA')


Related Topics



Leave a reply



Submit