Merging Two Dataframes With Different Lengths

Merge dataframes, different lengths

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
   variable ID value concreteness familiarity typicality
1    amoeba  1     0         3.60        1.30       1.71
2    amoeba  2     0         3.60        1.30       1.71
3    amoeba  3    NA         3.60        1.30       1.71
4 bacterium  1     0         3.82        3.48       2.13
5 bacterium  2     0         3.82        3.48       2.13
6 bacterium  3     0         3.82        3.48       2.13
7     leech  1     1         5.71        1.83       4.50
8     leech  2     1         5.71        1.83       4.50
9     leech  3     0         5.71        1.83       4.50

How to merge two dataframes with different lengths in python

I assume your output should look somewhat like this:

Week	Coeff1	Coeff2
1	-0.456662	0.571707
1	-0.533774	0.086152
1	-0.432871	0.824832
2	3	3
2	NaN	3

Merge dataframe with different lengths

offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
    sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
    sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
    concat_df = pd.concat([sub_df1, sub_df2], axis=1)
    concat_df["OFFERING_ID"] = id
    offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)

That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().

How can I combine two dataframes with different lengths in R?

Try using left_join in the dplyr package.

library(dplyr)

# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))

# use left_join
df3 <- left_join(df1, df2, by = "id")

# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0

How can I merge two data frames with different length and with two conditions in R?

You can use the left_join with by to join by multiple columns. You can use the following code:

library(dplyr)
df3 <- left_join(df2, df1, by = c("From" = "Country", "Year" = "Year"))

Output:

  Year From Vote  To Score
1    1   NE    1 Ger   0.8
2    1   NE    2   I   0.8
3    1   UK    2 Ger   0.9
4    1   UK    3   I   0.9
5    2   NE    2 Ger   0.7
6    2   NE    2   I   0.7
7    2   UK    4 Ger   1.0
8    2   UK    2   I   1.0

Merge two python pandas data frames of different length but keep all rows in output data frame

You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:

df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA')

Merging two dataframes of different length, on a particular column with different number of instances

Option 1: join

This solution requires that you set the index of D2 and use the on parameter

D1.join(D2.set_index('ID'), on='ID')

   ID val1 val2  Target
0   1    x    y       0
1   1    x    y       0
2   2    a    b       1
3   2    a    c       1

Note: if D2 doesn't include all values in D1.ID and you want a null value for the rows of D1 where that is true, then use the how='left' option.

D1.join(D2.set_index('ID'), on='ID', how='left')

from comments:

Why does this require setting the index of D2? The other answers don't do that. – ErikE
@ErikE that is the difference between merge and join. pandas.DataFrame.merge will perform its merging on column values by default. While join looks at the index by default. I can override joins behavior by specifying a column to join on with on='ID'. However, that override ability is limited to the left object only. So, I have to set the index of the right object in order to execute appropriately. – piRSquared

Option 2: map + assign

This solution is going to turn D2 into something dict like, a pd.Series with the index being the 'ID's and the values being the 'Target'. map converts the 'ID' column on D1 into new values and we assign it to a new column with assign.

D1.assign(Target=D1.ID.map(D2.set_index('ID').Target))


   ID val1 val2  Target
0   1    x    y       0
1   1    x    y       0
2   2    a    b       1
3   2    a    c       1