Merging Two Dataframes With Different Lengths

Merge dataframes, different lengths

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
variable ID value concreteness familiarity typicality
1 amoeba 1 0 3.60 1.30 1.71
2 amoeba 2 0 3.60 1.30 1.71
3 amoeba 3 NA 3.60 1.30 1.71
4 bacterium 1 0 3.82 3.48 2.13
5 bacterium 2 0 3.82 3.48 2.13
6 bacterium 3 0 3.82 3.48 2.13
7 leech 1 1 5.71 1.83 4.50
8 leech 2 1 5.71 1.83 4.50
9 leech 3 0 5.71 1.83 4.50

How to merge two dataframes with different lengths in python

I assume your output should look somewhat like this:





































WeekCoeff1Coeff2
1-0.4566620.571707
1-0.5337740.086152
1-0.4328710.824832
233
2NaN3

Merge dataframe with different lengths

offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
concat_df = pd.concat([sub_df1, sub_df2], axis=1)
concat_df["OFFERING_ID"] = id
offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)

That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().

How can I combine two dataframes with different lengths in R?

Try using left_join in the dplyr package.

library(dplyr)

# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))

# use left_join
df3 <- left_join(df1, df2, by = "id")

# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0

How can I merge two data frames with different length and with two conditions in R?

You can use the left_join with by to join by multiple columns. You can use the following code:

library(dplyr)
df3 <- left_join(df2, df1, by = c("From" = "Country", "Year" = "Year"))

Output:

  Year From Vote  To Score
1 1 NE 1 Ger 0.8
2 1 NE 2 I 0.8
3 1 UK 2 Ger 0.9
4 1 UK 3 I 0.9
5 2 NE 2 Ger 0.7
6 2 NE 2 I 0.7
7 2 UK 4 Ger 1.0
8 2 UK 2 I 1.0

Merge two python pandas data frames of different length but keep all rows in output data frame

You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:

df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA')

Merging two dataframes of different length, on a particular column with different number of instances

Option 1: join

This solution requires that you set the index of D2 and use the on parameter

D1.join(D2.set_index('ID'), on='ID')

ID val1 val2 Target
0 1 x y 0
1 1 x y 0
2 2 a b 1
3 2 a c 1

Note: if D2 doesn't include all values in D1.ID and you want a null value for the rows of D1 where that is true, then use the how='left' option.

D1.join(D2.set_index('ID'), on='ID', how='left')

from comments:

Why does this require setting the index of D2? The other answers don't do that. – ErikE

@ErikE that is the difference between merge and join. pandas.DataFrame.merge will perform its merging on column values by default. While join looks at the index by default. I can override joins behavior by specifying a column to join on with on='ID'. However, that override ability is limited to the left object only. So, I have to set the index of the right object in order to execute appropriately. – piRSquared

Option 2: map + assign

This solution is going to turn D2 into something dict like, a pd.Series with the index being the 'ID's and the values being the 'Target'. map converts the 'ID' column on D1 into new values and we assign it to a new column with assign.

D1.assign(Target=D1.ID.map(D2.set_index('ID').Target))


ID val1 val2 Target
0 1 x y 0
1 1 x y 0
2 2 a b 1
3 2 a c 1


Related Topics



Leave a reply



Submit