Pandas Long to Wide Reshape, by Two Variables

Pandas long to wide reshape, by two variables

A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:

df['idx'] = df.groupby('Salesman').cumcount()

Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:

print df.pivot(index='Salesman',columns='idx')[['product','price']]

product price
idx 0 1 2 0 1 2
Salesman
Knut bat ball wand 5 1 3
Steve pen NaN NaN 2 NaN NaN

To get closer to your desired output I added the following:

df['prod_idx'] = 'product_' + df.idx.astype(str)
df['prc_idx'] = 'price_' + df.idx.astype(str)

product = df.pivot(index='Salesman',columns='prod_idx',values='product')
prc = df.pivot(index='Salesman',columns='prc_idx',values='price')

reshape = pd.concat([product,prc],axis=1)
reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates()
print reshape

product_0 product_1 product_2 price_0 price_1 price_2 Height
Salesman
Knut bat ball wand 5 1 3 6
Steve pen NaN NaN 2 NaN NaN 5

Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):

df['idx'] = df.groupby('Salesman').cumcount()

tmp = []
for var in ['product','price']:
df['tmp_idx'] = var + '_' + df.idx.astype(str)
tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var))

reshape = pd.concat(tmp,axis=1)

@Luke said:

I think Stata can do something like this with the reshape command.

You can but I think you also need a within group counter to get the reshape in stata to get your desired output:

     +-------------------------------------------+
| salesman idx height product price |
|-------------------------------------------|
1. | Knut 0 6 bat 5 |
2. | Knut 1 6 ball 1 |
3. | Knut 2 6 wand 3 |
4. | Steve 0 5 pen 2 |
+-------------------------------------------+

If you add idx then you could do reshape in stata:

reshape wide product price, i(salesman) j(idx)

Reshaping Long Data to Wide in Python (Pandas)

You can pivot your dataframe:

df.pivot(index='TICKER', columns='date', values='RET')

date 20050131 20050231
TICKER
AAPL 0.02 0.01
GOOG 0.05 0.03

Pandas long to wide, WHILE preserving existing columns?

You were actually almost all the way there with pivot(). Specifying the index will take you almost all the way there:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales",
)

df

The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = (
frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales"
)
.reset_index() # collapses multi-index
.rename_axis(None, axis=1) # renames index
)

df

Reshape pandas dataframe wide to long, with some variables to stack, other variables to repeat

>>> id['id'] = df.index
>>> pd.wide_to_long(df, ["wage", "hours"], i="id", j="year")
edu race wage hours
id year
0 1985 1 3 8 5
1 1985 1 4 7 5
2 1985 7 3 7 3
3 1985 0 1 8 8
4 1985 2 7 2 1
5 1985 6 1 8 3
6 1985 7 1 4 6
7 1985 8 6 0 6
8 1985 7 0 8 4
9 1985 6 3 2 7
0 1986 1 3 2 0
1 1986 1 4 1 8
2 1986 7 3 5 7
3 1986 0 1 3 3
4 1986 2 7 7 6
5 1986 6 1 1 8
6 1986 7 1 7 6
7 1986 8 6 6 2
8 1986 7 0 7 2
9 1986 6 3 2 2

Pandas long reshape for several variables

You could try to pivot your dataframe, after building a custom index per session:

df2 = df.assign(index=df.groupby(['Session']).cumcount()).pivot(
'index', 'Session', ['Tube', 'Window', 'Counts', 'Length']).rename_axis(index=None)

With you sample data it would give:

        Tube       Window       Counts      Length     
Session 1 10 1 10 1 10 1 10
0 1.0 53.0 1.0 36.0 0.0 0.0 0.0 0.0
1 1.0 53.0 2.0 37.0 0.0 0.0 0.0 0.0
2 1.0 53.0 3.0 38.0 0.0 0.0 0.0 0.0
3 1.0 53.0 4.0 39.0 0.0 0.0 0.0 0.0
4 1.0 53.0 5.0 40.0 0.0 0.0 0.0 0.0

Not that bad but we have a MultiIndex for the columns and in a wrong order. Let us go further:

df2.columns = df2.columns.to_flat_index()
df2 = df2.reindex(columns=sorted(df2.columns, key=lambda x: x[1]))

We now have:

   (Tube, 1)  (Window, 1)  ...  (Counts, 10)  (Length, 10)
0 1.0 1.0 ... 0.0 0.0
1 1.0 2.0 ... 0.0 0.0
2 1.0 3.0 ... 0.0 0.0
3 1.0 4.0 ... 0.0 0.0
4 1.0 5.0 ... 0.0 0.0

Last step:

df2 = df2.rename(columns=lambda x: '_'.join(str(i) for i in x))

to finaly get:

   Tube_1  Window_1  Counts_1  ...  Window_10  Counts_10  Length_10
0 1.0 1.0 0.0 ... 36.0 0.0 0.0
1 1.0 2.0 0.0 ... 37.0 0.0 0.0
2 1.0 3.0 0.0 ... 38.0 0.0 0.0
3 1.0 4.0 0.0 ... 39.0 0.0 0.0
4 1.0 5.0 0.0 ... 40.0 0.0 0.0

How to reshape dataframe, wide to long, for several variables in one or several calls?

You can use wide_to_long; the issue is just that your column names need to be changed a bit, so that the stubnames are ['pct', 'valid', 'value'], and not t#.

import pandas as pd
import numpy as np

# Reverse order of words around '_'
df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns]
# Add prefix for other stubs
df = df.rename(columns= dict((f't{i}', f'value_t{i}') for i in np.arange(1,4,1)))

pd.wide_to_long(df, stubnames=['pct', 'valid', 'value'],
i='id', j='test', suffix='.*', sep='_').reset_index()

Output:

            id test   pct  valid  value
0 66602088802 t1 0.46 True car
1 85002620928 t1 0.51 True house
2 66602088802 t2 0.15 True bike
3 85002620928 t2 0.07 True car
4 66602088802 t3 0.06 False car
5 85002620928 t3 0.07 False toy


Related Topics



Leave a reply



Submit