Pandas Dataframe Stack Multiple Column Values into Single Column

Pandas: Multiple columns into one column

Update

pandas has a built in method for this stack which does what you want see the other answer.

This was my first answer before I knew about stack many years ago:

In [227]:

df = pd.DataFrame({'Column 1':['A', 'B', 'C', 'D'],'Column 2':['E', 'F', 'G', 'H']})
df
Out[227]:
Column 1 Column 2
0 A E
1 B F
2 C G
3 D H

[4 rows x 2 columns]

In [228]:

df['Column 1'].append(df['Column 2']).reset_index(drop=True)
Out[228]:
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
dtype: object

Pandas DataFrame stack multiple column values into single column

You can melt your dataframe:

>>> keys = [c for c in df if c.startswith('key.')]
>>> pd.melt(df, id_vars='topic', value_vars=keys, value_name='key')

topic variable key
0 8 key.0 abc
1 9 key.0 xab
2 8 key.1 def
3 9 key.1 xcd
4 8 key.2 ghi
5 9 key.2 xef

It also gives you the source of the key.


From v0.20, melt is a first class function of the pd.DataFrame class:

>>> df.melt('topic', value_name='key').drop('variable', 1)

topic key
0 8 abc
1 9 xab
2 8 def
3 9 xcd
4 8 ghi
5 9 xef

How to stack/append all columns into one column in Pandas?

Very simply with melt:

import pandas as pd
df.melt().drop('variable',axis=1).rename({'value':'A'},axis=1)


   A
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9

Pandas stack multiple columns to a single column

Use custom function with DataFrame.append first with custom DataFrame filled by default NaN values:

def f(x):
names = pd.DataFrame(index=x.name, columns=x.columns).assign(Level=[0,1])
#print (names)
return names.append(x.reset_index(level=[0,1], drop=True).assign(Level=2))

out = out.groupby(level=[0,1], group_keys=False).apply(f)

And then remove duplicated 0 Levels:

out = out[~out.index.duplicated() | out['Level'].isin([1,2])]


print (out)
TRT01A Placebo Treatment A Treatment B Level
HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean NaN NaN 42.910335 2
n NaN NaN 1.000000 2
deviation NaN NaN NaN 2
Q1 NaN NaN 42.910335 2
WHITE NaN NaN NaN 1
mean 35.724846 45.522245 45.226557 2
n 2.000000 1.000000 1.000000 2
deviation 5.108979 NaN NaN 2
Q1 32.130315 45.522245 45.226557 2
NOT HISPANIC OR LATINO NaN NaN NaN 0
BLACK OR AFRICAN AMERICAN NaN NaN NaN 1
mean 22.926762 NaN NaN 2
n 1.000000 NaN NaN 2
deviation NaN NaN NaN 2
Q1 22.926762 NaN NaN 2
WHITE NaN NaN NaN 1
mean 36.627881 38.203970 34.934976 2
n 3.000000 1.000000 2.000000 2
deviation 9.087438 NaN 4.398485 2
Q1 31.381246 38.203970 31.840329 2

Merge multiple column values into one column in python pandas

You can call apply pass axis=1 to apply row-wise, then convert the dtype to str and join:

In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1
)
df

Out[153]:
Column1 Column2 Column3 Column4 Column5 ColumnA
0 a 1 2 3 4 1,2,3,4
1 a 3 4 5 NaN 3,4,5
2 b 6 7 8 NaN 6,7,8
3 c 7 7 NaN NaN 7,7

Here I call dropna to get rid of the NaN, however we need to cast again to int so we don't end up with floats as str.

Append multiple columns to single column

Try:

single_column_frame = pd.concat([df[col] for col in df.columns])

If you want to create a single column and get rid of month names:

df_new = df.melt()['value'].to_frame()

Or you can do:

single_column_frame = single_column_frame.reset_index().drop(columns=['index'])

You can also do:

single_column_frame = df.stack().reset_index().loc[:,0]

Combine Multiple Pandas columns into a Single Column

You can do this with df.melt().

df.melt(
id_vars = ['x1','x2','x3','x4','x5'],
value_vars = ['y1','y2','y3','y4','y5'],
value_name = 'y'
).drop(columns='variable')

df.melt() will have the column called variable that has the value for which column it originally came from (so is that row coming from y1, y2, etc), so you want to drop that as you see above.

How to convert multiple set of column to single column in pandas?

You are essentially asking how to coalesce a values of certain df-columns into one column - you can do it like this:

from random import choice
import pandas as pd

# all azimuth names
azi_names = [f"Azi_{i}" for i in range(5)]

# all distance names
dist_names = [f"Dist_{i}" for i in range(5)]

df = pd.DataFrame(columns = azi_names + dist_names)

# put some values in
for i in range(20):
k = choice(range(5))
df = df.append({f"Azi_{k}": i, f"Dist_{k}": i}, ignore_index=True)

print(df)

which randomly creates:

    Azi_0  Azi_1  Azi_2  Azi_3  Azi_4  Dist_0  Dist_1  Dist_2  Dist_3  Dist_4
0 NaN NaN NaN 0.0 NaN NaN NaN NaN 0.0 NaN
1 NaN 1.0 NaN NaN NaN NaN 1.0 NaN NaN NaN
2 2.0 NaN NaN NaN NaN 2.0 NaN NaN NaN NaN
3 NaN NaN 3.0 NaN NaN NaN NaN 3.0 NaN NaN
4 NaN 4.0 NaN NaN NaN NaN 4.0 NaN NaN NaN
5 NaN NaN NaN NaN 5.0 NaN NaN NaN NaN 5.0
6 6.0 NaN NaN NaN NaN 6.0 NaN NaN NaN NaN
7 NaN 7.0 NaN NaN NaN NaN 7.0 NaN NaN NaN
8 NaN 8.0 NaN NaN NaN NaN 8.0 NaN NaN NaN
9 9.0 NaN NaN NaN NaN 9.0 NaN NaN NaN NaN
10 NaN NaN 10.0 NaN NaN NaN NaN 10.0 NaN NaN
11 11.0 NaN NaN NaN NaN 11.0 NaN NaN NaN NaN
12 12.0 NaN NaN NaN NaN 12.0 NaN NaN NaN NaN
13 NaN NaN 13.0 NaN NaN NaN NaN 13.0 NaN NaN
14 NaN 14.0 NaN NaN NaN NaN 14.0 NaN NaN NaN
15 NaN NaN NaN 15.0 NaN NaN NaN NaN 15.0 NaN
16 NaN NaN NaN NaN 16.0 NaN NaN NaN NaN 16.0
17 NaN NaN 17.0 NaN NaN NaN NaN 17.0 NaN NaN
18 NaN NaN NaN NaN 18.0 NaN NaN NaN NaN 18.0
19 NaN NaN NaN 19.0 NaN NaN NaN NaN 19.0 NaN

To coalesce this and only keep filled values you use

df2 = pd.DataFrame()

# propagates values and chooses first
df2["AZI"] = df[azi_names].bfill(axis=1).iloc[:, 0]
df2["DIS"] = df[dist_names].bfill(axis=1).iloc[:, 0]

print(df2)

to get a coalesced new df:

     AZI   DIS
0 0.0 0.0
1 1.0 1.0
2 2.0 2.0
3 3.0 3.0
4 4.0 4.0
5 5.0 5.0
6 6.0 6.0
7 7.0 7.0
8 8.0 8.0
9 9.0 9.0
10 10.0 10.0
11 11.0 11.0
12 12.0 12.0
13 13.0 13.0
14 14.0 14.0
15 15.0 15.0
16 16.0 16.0
17 17.0 17.0
18 18.0 18.0
19 19.0 19.0

Attributation: inspired by Erfan's answer to Coalesce values from 2 columns into a single column in a pandas dataframe

You may need to Replacing blank values (white space) with NaN in pandas for your shown data.

How to concatenate multiple column values into a single column in Panda dataframe based on start and end time

Let's do this in a few steps.

First, let's make sure your Timestamp is a datetime.

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

Then we can create a new dataframe based on a min and max values of your timestamp.

df1 = pd.DataFrame({'start_time' : pd.date_range(df['Timestamp'].min(), df['Timestamp'].max())})

df1['end_time'] = df1['start_time'] + pd.DateOffset(days=1)

start_time end_time
0 2013-02-01 2013-02-02
1 2013-02-02 2013-02-03
2 2013-02-03 2013-02-04
3 2013-02-04 2013-02-05
4 2013-02-05 2013-02-06
5 2013-02-06 2013-02-07
6 2013-02-07 2013-02-08
7 2013-02-08 2013-02-09

Now we need to create a dataframe to merge onto your start_time column.

Let's filter out any values that are less than 0 and create a list of active appliances:

df = df.set_index('Timestamp')
# the remaining columns MUST be integers for this to work.
# or you'll need to subselect them.
df2 = df.mask(df.le(0)).stack().reset_index(1).groupby(level=0)\
.agg(active_appliances=('level_1',list)).reset_index(0)

# change .agg(active_appliances=('level_1',list) >
# to .agg(active_appliances=('level_1',','.join)
# if you prefer strings.

Timestamp active_appliances
0 2013-02-01 [A]
1 2013-02-02 [A, B, C]
2 2013-02-03 [A, C]
3 2013-02-04 [A, B, C]
4 2013-02-05 [B, C]
5 2013-02-06 [A, B, C]
6 2013-02-07 [A, B, C]

Then we can merge:

final = pd.merge(df1,df2,left_on='start_time',right_on='Timestamp',how='left').drop('Timestamp',1)

start_time end_time active_appliances
0 2013-02-01 2013-02-02 [A]
1 2013-02-02 2013-02-03 [A, B, C]
2 2013-02-03 2013-02-04 [A, C]
3 2013-02-04 2013-02-05 [A, B, C]
4 2013-02-05 2013-02-06 [B, C]
5 2013-02-06 2013-02-07 [A, B, C]
6 2013-02-07 2013-02-08 [A, B, C]
7 2013-02-08 2013-02-09 NaN


Related Topics



Leave a reply



Submit