Pandas Long to Wide Reshape, by Two Variables

Pandas long to wide reshape, by two variables

A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:

df['idx'] = df.groupby('Salesman').cumcount()

Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:

print df.pivot(index='Salesman',columns='idx')[['product','price']]

        product              price        
idx            0     1     2      0   1   2
Salesman                                   
Knut         bat  ball  wand      5   1   3
Steve        pen   NaN   NaN      2 NaN NaN

To get closer to your desired output I added the following:

df['prod_idx'] = 'product_' + df.idx.astype(str)
df['prc_idx'] = 'price_' + df.idx.astype(str)

product = df.pivot(index='Salesman',columns='prod_idx',values='product')
prc = df.pivot(index='Salesman',columns='prc_idx',values='price')

reshape = pd.concat([product,prc],axis=1)
reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates()
print reshape

         product_0 product_1 product_2  price_0  price_1  price_2  Height
Salesman                                                                 
Knut           bat      ball      wand        5        1        3       6
Steve          pen       NaN       NaN        2      NaN      NaN       5

Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):

df['idx'] = df.groupby('Salesman').cumcount()

tmp = []
for var in ['product','price']:
    df['tmp_idx'] = var + '_' + df.idx.astype(str)
    tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var))

reshape = pd.concat(tmp,axis=1)

@Luke said:

I think Stata can do something like this with the reshape command.

You can but I think you also need a within group counter to get the reshape in stata to get your desired output:

     +-------------------------------------------+
     | salesman   idx   height   product   price |
     |-------------------------------------------|
  1. |     Knut     0        6       bat       5 |
  2. |     Knut     1        6      ball       1 |
  3. |     Knut     2        6      wand       3 |
  4. |    Steve     0        5       pen       2 |
     +-------------------------------------------+

If you add idx then you could do reshape in stata:

reshape wide product price, i(salesman) j(idx)

Reshaping Long Data to Wide in Python (Pandas)

You can pivot your dataframe:

df.pivot(index='TICKER', columns='date', values='RET')

date    20050131  20050231
TICKER                    
AAPL        0.02      0.01
GOOG        0.05      0.03

Pandas long to wide, WHILE preserving existing columns?

You were actually almost all the way there with pivot(). Specifying the index will take you almost all the way there:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = frame.pivot(
    index=["FirstName", "LastName", "Building"],
    columns="Month",
    values="Sales",
)

df

The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = (
    frame.pivot(
        index=["FirstName", "LastName", "Building"],
        columns="Month",
        values="Sales"
    )
    .reset_index()              # collapses multi-index
    .rename_axis(None, axis=1)  # renames index
)

df

Reshape pandas dataframe wide to long, with some variables to stack, other variables to repeat

>>> id['id'] = df.index
>>> pd.wide_to_long(df, ["wage", "hours"], i="id", j="year")
         edu  race  wage  hours
id year                        
0  1985    1     3     8      5
1  1985    1     4     7      5
2  1985    7     3     7      3
3  1985    0     1     8      8
4  1985    2     7     2      1
5  1985    6     1     8      3
6  1985    7     1     4      6
7  1985    8     6     0      6
8  1985    7     0     8      4
9  1985    6     3     2      7
0  1986    1     3     2      0
1  1986    1     4     1      8
2  1986    7     3     5      7
3  1986    0     1     3      3
4  1986    2     7     7      6
5  1986    6     1     1      8
6  1986    7     1     7      6
7  1986    8     6     6      2
8  1986    7     0     7      2
9  1986    6     3     2      2

Pandas long reshape for several variables

You could try to pivot your dataframe, after building a custom index per session:

df2 = df.assign(index=df.groupby(['Session']).cumcount()).pivot(
    'index', 'Session', ['Tube',  'Window', 'Counts', 'Length']).rename_axis(index=None)

With you sample data it would give:

        Tube       Window       Counts      Length     
Session   1     10     1     10     1    10     1    10
0        1.0  53.0    1.0  36.0    0.0  0.0    0.0  0.0
1        1.0  53.0    2.0  37.0    0.0  0.0    0.0  0.0
2        1.0  53.0    3.0  38.0    0.0  0.0    0.0  0.0
3        1.0  53.0    4.0  39.0    0.0  0.0    0.0  0.0
4        1.0  53.0    5.0  40.0    0.0  0.0    0.0  0.0

Not that bad but we have a MultiIndex for the columns and in a wrong order. Let us go further:

df2.columns = df2.columns.to_flat_index()
df2 = df2.reindex(columns=sorted(df2.columns, key=lambda x: x[1]))

We now have:

   (Tube, 1)  (Window, 1)  ...  (Counts, 10)  (Length, 10)
0        1.0          1.0  ...           0.0           0.0
1        1.0          2.0  ...           0.0           0.0
2        1.0          3.0  ...           0.0           0.0
3        1.0          4.0  ...           0.0           0.0
4        1.0          5.0  ...           0.0           0.0

Last step:

df2 = df2.rename(columns=lambda x: '_'.join(str(i) for i in x))

to finaly get:

   Tube_1  Window_1  Counts_1  ...  Window_10  Counts_10  Length_10
0     1.0       1.0       0.0  ...       36.0        0.0        0.0
1     1.0       2.0       0.0  ...       37.0        0.0        0.0
2     1.0       3.0       0.0  ...       38.0        0.0        0.0
3     1.0       4.0       0.0  ...       39.0        0.0        0.0
4     1.0       5.0       0.0  ...       40.0        0.0        0.0

How to reshape dataframe, wide to long, for several variables in one or several calls?

You can use wide_to_long; the issue is just that your column names need to be changed a bit, so that the stubnames are ['pct', 'valid', 'value'], and not t#.

import pandas as pd
import numpy as np

# Reverse order of words around '_'
df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns]
# Add prefix for other stubs
df = df.rename(columns= dict((f't{i}', f'value_t{i}') for i in np.arange(1,4,1)))

pd.wide_to_long(df, stubnames=['pct', 'valid', 'value'], 
                i='id', j='test', suffix='.*', sep='_').reset_index()

Output:

            id test   pct  valid  value
0  66602088802   t1  0.46   True    car
1  85002620928   t1  0.51   True  house
2  66602088802   t2  0.15   True   bike
3  85002620928   t2  0.07   True    car
4  66602088802   t3  0.06  False    car
5  85002620928   t3  0.07  False    toy

Pandas Long to Wide Reshape, by Two Variables