Merge dataframes, different lengths
You could add a join variable to dat2 then using merge:
dat2$variable <- rownames(dat2)
merge(dat1, dat2)
variable ID value concreteness familiarity typicality
1 amoeba 1 0 3.60 1.30 1.71
2 amoeba 2 0 3.60 1.30 1.71
3 amoeba 3 NA 3.60 1.30 1.71
4 bacterium 1 0 3.82 3.48 2.13
5 bacterium 2 0 3.82 3.48 2.13
6 bacterium 3 0 3.82 3.48 2.13
7 leech 1 1 5.71 1.83 4.50
8 leech 2 1 5.71 1.83 4.50
9 leech 3 0 5.71 1.83 4.50
How to merge two dataframes with different lengths in python
I assume your output should look somewhat like this:
Week | Coeff1 | Coeff2 |
---|---|---|
1 | -0.456662 | 0.571707 |
1 | -0.533774 | 0.086152 |
1 | -0.432871 | 0.824832 |
2 | 3 | 3 |
2 | NaN | 3 |
Merge dataframe with different lengths
offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
concat_df = pd.concat([sub_df1, sub_df2], axis=1)
concat_df["OFFERING_ID"] = id
offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)
That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().
How can I combine two dataframes with different lengths in R?
Try using left_join in the dplyr package.
library(dplyr)
# make fake data
df1 <- data.frame(id = c("A", "B", "C", "D", "E"), val = rpois(5, 5))
df2 <- data.frame(id = c("A", "B", "C", "E"), val = rpois(4, 20))
# use left_join
df3 <- left_join(df1, df2, by = "id")
# rename and set NAs to 0
names(df3) <- c("id", "val", "val")
df3[is.na(df3)] <- 0
how to combine two data frames of different lengths?
This is too long for a comment, but really just need to demonstrate that the solution I gave in comments does work. If you are having problems with getting merge to work, then there must be some other issue with your data, which we cannot diagnose because you did not provide a dput of your data.frames
df1 = read.table(text =
"Date Duration
6/27/2014 10.00
6/30/2014 20.00
7/11/2014 15.00",
header = T)
df2 = read.table(text =
"Date Percent_Removal
6/27/2014 20.39
6/30/2014 27.01
7/7/2014 49.84
7/11/2014 59.48
7/17/2014 99.04",
header = T)
df1$Date <- as.Date (df1$Date, format= "%m/%d/%Y")
df2$Date <- as.Date (df2$Date, format= "%m/%d/%Y")
df3 = merge(df1,df2)
# Date Duration Percent_Removal
# 1 2014-06-27 10 20.39
# 2 2014-06-30 20 27.01
# 3 2014-07-11 15 59.48
Note that no additional options need to be specified in the merge statement because
- The default value
by =
is the column names that are common to both data frames. In this case, onlyDate
is shared. - the default values of
all.x
,all.y
andall
give the desired behaviour where only the rows that are in both data frames are kept.
Merging two dataframes of different length, on a particular column with different number of instances
Option 1: join
This solution requires that you set the index of D2
and use the on
parameter
D1.join(D2.set_index('ID'), on='ID')
ID val1 val2 Target
0 1 x y 0
1 1 x y 0
2 2 a b 1
3 2 a c 1
Note: if D2
doesn't include all values in D1.ID
and you want a null value for the rows of D1
where that is true, then use the how='left'
option.
D1.join(D2.set_index('ID'), on='ID', how='left')
from comments:
Why does this require setting the index of D2? The other answers don't do that. – ErikE
@ErikE that is the difference between merge and join. pandas.DataFrame.merge will perform its merging on column values by default. While join looks at the index by default. I can override joins behavior by specifying a column to join on with on='ID'. However, that override ability is limited to the left object only. So, I have to set the index of the right object in order to execute appropriately. – piRSquared
Option 2: map
+ assign
This solution is going to turn D2
into something dict
like, a pd.Series
with the index being the 'ID'
s and the values being the 'Target'
. map
converts the 'ID'
column on D1
into new values and we assign it to a new column with assign
.
D1.assign(Target=D1.ID.map(D2.set_index('ID').Target))
ID val1 val2 Target
0 1 x y 0
1 1 x y 0
2 2 a b 1
3 2 a c 1
Merging two dataframes with different lengths
pandas.DataFrame.join
can "join" dataframes based on overlap in column data (or index). Something like this will likely work for you:
df1.join(df2.set_index('block_id'), on='block_id')
Merge two python pandas data frames of different length but keep all rows in output data frame
You can read the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html
What you are looking for is a left join. The default option is an inner join. You can change this behavior by passing a different how argument:
df1.merge(df2,how='left', left_on='Column1', right_on='ColumnA')
Related Topics
Can't Loop with R's Leaflet Package to Produce Multiple Maps
How to Read the Header But Also Skip Lines - Read.Table()
Extracting Unique Rows from a Data Table in R
How to Add Annotations Below the X Axis in Ggplot2
How to Convert Dd/Mm/Yy to Yyyy-Mm-Dd in R
Paste All Combinations of a Vector in R
Two-Way Density Plot Combined with One Way Density Plot with Selected Regions in R
Change the Position of the Strip Label in Ggplot from the Top to the Bottom
Rselenium: Server Signals Port Is Already in Use
How to Use Dplyr's Summarize and Which() to Lookup Min/Max Values
Select Along One of N Dimensions in Array
Reshape Multiple Categorical Variables to Binary Response Variables
Calculate Euclidean Distance Matrix Using a Big.Matrix Object
Replicate Each Row of Data.Frame and Specify the Number of Replications for Each Row