Concatenate Rows of Two Dataframes in Pandas

Concatenate rows of two dataframes in pandas

call concat and pass param axis=1 to concatenate column-wise:

In [5]:

pd.concat([df_a,df_b], axis=1)
Out[5]:
AAseq Biorep Techrep Treatment mz AAseq1 Biorep1 Techrep1 \
0 ELVISLIVES A 1 C 500.0 ELVISLIVES A 1
1 ELVISLIVES A 1 C 500.5 ELVISLIVES A 1
2 ELVISLIVES A 1 C 501.0 ELVISLIVES A 1

Treatment1 inte1
0 C 1100
1 C 1050
2 C 1010

There is a useful guide to the various methods of merging, joining and concatenating online.

For example, as you have no clashing columns you can merge and use the indices as they have the same number of rows:

In [6]:

df_a.merge(df_b, left_index=True, right_index=True)
Out[6]:
AAseq Biorep Techrep Treatment mz AAseq1 Biorep1 Techrep1 \
0 ELVISLIVES A 1 C 500.0 ELVISLIVES A 1
1 ELVISLIVES A 1 C 500.5 ELVISLIVES A 1
2 ELVISLIVES A 1 C 501.0 ELVISLIVES A 1

Treatment1 inte1
0 C 1100
1 C 1050
2 C 1010

And for the same reasons as above a simple join works too:

In [7]:

df_a.join(df_b)
Out[7]:
AAseq Biorep Techrep Treatment mz AAseq1 Biorep1 Techrep1 \
0 ELVISLIVES A 1 C 500.0 ELVISLIVES A 1
1 ELVISLIVES A 1 C 500.5 ELVISLIVES A 1
2 ELVISLIVES A 1 C 501.0 ELVISLIVES A 1

Treatment1 inte1
0 C 1100
1 C 1050
2 C 1010

How to concatenate combinations of rows from two different dataframes?

Use itertools.product():

import itertools
pd.DataFrame(list(itertools.product(df1.A,df2.B)),columns=['A','B'])

   A  B
0 1 a
1 1 b
2 1 c
3 2 a
4 2 b
5 2 c

How to concatenate two Dataframe rows using a mapping index

One option is to perform a double merge:

(df1.merge(df2.merge(MAP, left_on='C', right_on='C_index'),
left_on='A', right_on='A_index')
.filter(regex=r'^((?!_index).)*$') # remove the "X_index" columns
.drop(columns='C')
)

NB. I used MAP as name for the mapping dataframe as map is a python builtin

Alternative, more linear, syntax:

(df1.merge(MAP, left_on='A', right_on='A_index')        
.merge(df2, left_on='C_index', right_on='C')
.filter(regex=r'^((?!_index).)*$')
.drop(columns='C')
)

output:

   A           B     D
0 2 bike blue
1 3 pedestrian red

In Python Pandas, How do I concatenate rows of a df based on two columns? and in the order of a third one?

Setup:

Here is a short example and some code that moves the 'Sales' data into separate columns for each hour. You can change the value in the range from 3 to 24 for your case.

import pandas as pd
df = pd.DataFrame([['Dave', 1, 0, 10],['Dave', 1, 1, 20],['Dave', 1, 2, 30],
['Dave', 2, 0, 40],['Dave', 2, 1, 50],['Dave', 2, 2, 60],
['Carl', 1, 0, 15],['Carl', 1, 1, 25],['Carl', 1, 2, 35],
['Carl', 2, 0, 45],['Carl', 2, 1, 55],['Carl', 2, 2, 65]],
columns=['ID', 'Date', 'Hour', 'Sales'])

Output (df):

      ID  Date  Hour  Sales
0 Dave 1 0 10
1 Dave 1 1 20
2 Dave 1 2 30
3 Dave 2 0 40
4 Dave 2 1 50
5 Dave 2 2 60
6 Carl 1 0 15
7 Carl 1 1 25
8 Carl 1 2 35
9 Carl 2 0 45
10 Carl 2 1 55
11 Carl 2 2 65

'Where' and 'Merge':

The key here is using the pandas.merge function with the on argument to choose which columns to use as an index for merging.

df.where, df.merge, and df.dropna, are very versitile pieces of Pandas that are good to learn.

new = pd.DataFrame(columns=['ID','Date'])
for hour in range(3):
tmp = df.where(df.Hour == hour).dropna(axis=0, how='all')
tmp[hour] = tmp['Sales']
tmp.drop(['Hour','Sales'], axis=1, inplace=True)
new = new.merge(tmp, how='outer', on=['ID','Date'])
new.set_index(['ID','Date'], inplace=True)

Output (new):

              0     1     2
ID Date
Dave 1.0 10.0 20.0 30.0
2.0 40.0 50.0 60.0
Carl 1.0 15.0 25.0 35.0
2.0 45.0 55.0 65.0

Pivot Tables:

For this specific problem, you can use pivot tables to do all that work for you.

dfp = df.pivot(index=['ID','Date'], columns='Hour', values='Sales')

Output (dfp):

Hour        0   1   2
ID Date
Carl 1 15 25 35
2 45 55 65
Dave 1 10 20 30
2 40 50 60

Pandas: Combining Two DataFrames Horizontally

concat is indeed what you're looking for, you just have to pass it a different value for the "axis" argument than the default. Code sample below:

import pandas as pd

df1 = pd.DataFrame({
'A': [1,2,3,4,5],
'B': [1,2,3,4,5]
})

df2 = pd.DataFrame({
'C': [1,2,3,4,5],
'D': [1,2,3,4,5]
})

df_concat = pd.concat([df1, df2], axis=1)

print(df_concat)

With the result being:

   A  B  C  D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4
4 5 5 5 5

Concatenate row values in Pandas DataFrame

merge does not concatenate the dfs as you want, use append instead.

ndf = df1.append(df2).sort_values('name')

You can also use concat:

ndf = pd.concat([df1, df2]).sort_values('name')

concatenate rows on dataframe one by one

One way is to change the indices of your input dataframes. Then concatenate and sort by index. This will also handle situations where your dataframes have mismatched lengths.

df1.index = df1.index*2
df2.index = df2.index*2 + 1

res = pd.concat([df1, df2]).sort_index()

print(res)

data type
0 a 1
1 v 2
2 b 1
3 w 2
4 c 1
5 x 2
6 d 1
7 y 2
8 e 1
9 z 2

If you need to normalize your index when your dataframes have inconsistent lengths, you can use reset_index as a final step:

res = res.reset_index(drop=True)

Joining two dataframes then combining data in fields with same name using Pandas

Instead of merging, concatenate

# concatenate and groupby to join the strings
df = pd.concat([data1, data2]).groupby('State', as_index=False).agg(lambda x: '; '.join(el for el in x if pd.notna(el)))
print(df)
State Product Cashier Type
0 CA Banana; Shirt Sally;
1 MN Apple; Shoe Gretta; Trish
2 NM Socks Paula Hourly
3 NV Orange Samantha


Related Topics



Leave a reply



Submit