Reshape Wide to Long in Pandas

Reshape wide to long in pandas

Update

As George Liu has shown in another answer, pd.melt is the idiomatic, flexible and fast solution to this problem. Do not use unstack for this.

unstack returns a series with a multiindex:

    In [38]: df.unstack()
    Out[38]: 
        date 
    AA  05/03    1
        06/03    4
        07/03    7
        08/03    5
    BB  05/03    2
        06/03    5
        07/03    8
        08/03    7
    CC  05/03    3
        06/03    6
        07/03    9
        08/03    1
    dtype: int64

You can call reset_index on the returning series:

In [39]: df.unstack().reset_index() 
Out[39]:        
        
    level_0 date    0
0   AA      05-03   1
1   AA      06-03   4
2   AA      07-03   7
3   AA      08-03   5
4   BB      05-03   2
5   BB      06-03   5
6   BB      07-03   8
7   BB      08-03   7
8   CC      05-03   3
9   CC      06-03   6
10  CC      07-03   9
11  CC      08-03   1

Or construct a dataframe with a multiindex:

In [40]: pd.DataFrame(df.unstack())     
Out[40]:        
        
            0
    date    
AA  05-03   1
    06-03   4
    07-03   7
    08-03   5
BB  05-03   2
    06-03   5
    07-03   8
    08-03   7
CC  05-03   3
    06-03   6
    07-03   9
    08-03   1

Pandas long to wide reshape, by two variables

A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:

df['idx'] = df.groupby('Salesman').cumcount()

Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:

print df.pivot(index='Salesman',columns='idx')[['product','price']]

        product              price        
idx            0     1     2      0   1   2
Salesman                                   
Knut         bat  ball  wand      5   1   3
Steve        pen   NaN   NaN      2 NaN NaN

To get closer to your desired output I added the following:

df['prod_idx'] = 'product_' + df.idx.astype(str)
df['prc_idx'] = 'price_' + df.idx.astype(str)

product = df.pivot(index='Salesman',columns='prod_idx',values='product')
prc = df.pivot(index='Salesman',columns='prc_idx',values='price')

reshape = pd.concat([product,prc],axis=1)
reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates()
print reshape

         product_0 product_1 product_2  price_0  price_1  price_2  Height
Salesman                                                                 
Knut           bat      ball      wand        5        1        3       6
Steve          pen       NaN       NaN        2      NaN      NaN       5

Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):

df['idx'] = df.groupby('Salesman').cumcount()

tmp = []
for var in ['product','price']:
    df['tmp_idx'] = var + '_' + df.idx.astype(str)
    tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var))

reshape = pd.concat(tmp,axis=1)

@Luke said:

I think Stata can do something like this with the reshape command.

You can but I think you also need a within group counter to get the reshape in stata to get your desired output:

     +-------------------------------------------+
     | salesman   idx   height   product   price |
     |-------------------------------------------|
  1. |     Knut     0        6       bat       5 |
  2. |     Knut     1        6      ball       1 |
  3. |     Knut     2        6      wand       3 |
  4. |    Steve     0        5       pen       2 |
     +-------------------------------------------+

If you add idx then you could do reshape in stata:

reshape wide product price, i(salesman) j(idx)

Reshape wide to long for many columns with a common prefix

A couple of options to do this:

with pd.wide_to_long, you need to reorder the positions based on the delimiter; in this case we move the a, b, ... to the fore and the p1, p2 to the back, before reshaping:

temp = df.copy()
temp = temp.rename(columns = lambda df: ".".join(df.split(".")[::-1]))
(pd.wide_to_long(temp.reset_index(), 
                 stubnames = ["a", "b", "c", "d", "e", "f", "g"], 
                 sep=".", 
                 suffix=".+", 
                 i = "index", 
                 j = "side")
  .droplevel('index')
  .reset_index()

  side  a   b   c   d   e   f   g
0   p1  4   1   2   3   4   5   6
1   p1  0   4   8  12  16  20  24
2   p2  0   3   6   9  12  15  18
3   p2  0  12  24  36  48  60  72

One limitation with pd.wide_to_long is the reshaping of positions. The other limitation is that the stubnames have to be explicitly specified.

Another option is via stack, where the columns are split, based on the delimiter and reshaped:

temp = df.copy()
temp.columns = temp.columns.str.split(".", expand = True)

temp.stack(0).droplevel(0).rename_axis('side').reset_index()

  side  a   b   c   d   e   f   g
0   p1  4   1   2   3   4   5   6
1   p2  0   3   6   9  12  15  18
2   p1  0   4   8  12  16  20  24
3   p2  0  12  24  36  48  60  72

stack is quite flexible, and did not require us to list the column names. The limitation of stack is that it fails if the index is not unique.

Another option is pivot_longer from pyjanitor, which abstracts the process:

# pip install janitor
import janitor
df.pivot_longer(index = None, 
                names_to = ("side", ".value"), 
                names_sep=".")
 
  side  a   b   c   d   e   f   g
0   p1  4   1   2   3   4   5   6
1   p1  0   4   8  12  16  20  24
2   p2  0   3   6   9  12  15  18
3   p2  0  12  24  36  48  60  72

The worker here is .value. This tells the code that anything after . should remain as column names, while anything before . should be collated into a new column (side). Note that, unlike wide_to_long, the stubnames do not need to be stated - it abstracts that for us. Also, it can handle duplicate indices, since it uses pd.melt under the hood.
One limitation of pivot_longer is that you have to install the pyjanitor library.

For the other example, I'll use stack and pivot_longer; you can still use pd.wide_to_long to solve it.

With stack:

first split the columns and convert into a MultiIndex:

temp = df.copy()
temp.columns = temp.columns.str.split(".", expand = True)

Reshape the data:

temp = temp.stack(1).droplevel(0).rename_axis('side')

Merge the column names:

temp.columns = temp.columns.map(".".join)

Reset the index:

temp.reset_index()

  side  foo.a  foo.b  foo.c  foo.d  foo.e  foo.f  foo.g
0   p1      4      1      2      3      4      5      6
1   p2      0      3      6      9     12     15     18
2   p1      0      4      8     12     16     20     24
3   p2      0     12     24     36     48     60     72

With pivot_longer, one option is to reorder the columns, before reshaping:

temp = df.copy()

temp.columns = ["".join([first, last, middle]) 
                for first, middle, last in 
                temp.columns.str.split(r'(\.p\d)')]

(
temp
.pivot_longer(
    index = None, 
    names_to = ('.value', 'side'), 
    names_pattern = r"(.+)\.(p\d)")
)
  side  foo.a  foo.b  foo.c  foo.d  foo.e  foo.f  foo.g
0   p1      4      1      2      3      4      5      6
1   p1      0      4      8     12     16     20     24
2   p2      0      3      6      9     12     15     18
3   p2      0     12     24     36     48     60     72

In the dev version however, the column reorder is not necessary; we can simply use multiple .value to reshape the dataframe - note that you'll have to install from the repo to get the latest dev version:

# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
(df
.pivot_longer(
    index = None, 
    names_to = ('.value', 'side', '.value'), 
    names_pattern = r"(.+)\.(.\d)(.+)")
)
  side  foo.a  foo.b  foo.c  foo.d  foo.e  foo.f  foo.g
0   p1      4      1      2      3      4      5      6
1   p1      0      4      8     12     16     20     24
2   p2      0      3      6      9     12     15     18
3   p2      0     12     24     36     48     60     72

Another option with names_sep:

(df
.pivot_longer(
    index = None, 
    names_to = ('.value', 'side', '.value'), 
    names_sep = r'\.(p\d)')
)
  side  foo.a  foo.b  foo.c  foo.d  foo.e  foo.f  foo.g
0   p1      4      1      2      3      4      5      6
1   p1      0      4      8     12     16     20     24
2   p2      0      3      6      9     12     15     18
3   p2      0     12     24     36     48     60     72

Pandas long to wide (unmelt or similar?)

Make a helper column that assigns which rows are associated with name-1/val-1, and name-2/val-2
Using this new column we can now pivot the data without encountering that "duplicate entries" error

Clean the output:

sort the columns to match your output
The pivot will return a dataframe with a columnar multiindex. We'll need to flatten this by combining levels

pandas > 1.0.5

id_vars = ["varA", "varB"]

# Create an id column that tracks the id_vars
# Pivot using the id column
# Sort the columns to match OP expected output
pivoted_df = (
    df.assign(
        id=df.groupby(id_vars).cumcount().add(1).astype(str)
    )
    .pivot(index=id_vars, columns="id", values=["name", "val"])
    .sort_index(level=1, axis=1)
)

# flatten the column multiindex, insert the row index as values
flattened_columns = pivoted_df.columns.map("-".join)
pivoted_df = (
    pivoted_df.set_axis(flattened_columns, axis=1)
    .reset_index()
)

print(pivoted_df)
   varA  varB name-1 val-1 name-2 val-2
0     1   200      A     4      B     1
1     2   250      A     8      B     0

pandas <= 1.0.5

pivot in these versions did not support using lists as arguments. A workaround is to instead use the pivot_table with a nonaggregating aggfunc prevent accidental aggregation.

id_vars = ["varA", "varB"]

# Create an id column that tracks the id_vars
# Pivot using the id column
# Sort the columns to match OP expected output
pivoted_df = (
    df.assign(
        id=df.groupby(id_vars).cumcount().add(1).astype(str)
    )
    .pivot_table(index=id_vars, columns="id", values=["name", "val"], aggfunc=lambda x: x)
    .sort_index(level=1, axis=1)
)

# flatten the column multiindex, insert the row index as values
flattened_columns = pivoted_df.columns.map("-".join)
pivoted_df = (
    pivoted_df.set_axis(flattened_columns, axis=1)
    .reset_index()
)

Wide to long data transform in pandas

You can use melt for reshaping, then split column variable and drop and sort_values. I think you can cast column year to int by astype and last change order of columns by subset:

df1 = (pd.melt(df, id_vars=['county','area'], value_name='pop'))
df1[['tmp','year']] = df1.variable.str.split('_', expand=True)
df1 = df1.drop(['variable', 'tmp'],axis=1).sort_values(['county','year'])
df1['year'] = df1.year.astype(int)
df1 = df1[['county','year','pop','area']]
print (df1)
   county  year   pop  area
0    1001  2006  1037   275
3    1001  2007  1052   275
6    1001  2008  1102   275
1    1003  2006  2399   394
4    1003  2007  2424   394
7    1003  2008  2438   394
2    1005  2006  1638   312
5    1005  2007  1647   312
8    1005  2008  1660   312

print (df1.dtypes)
county    int64
year      int32
pop       int64
area      int64
dtype: object

Another solution with set_index, stack and reset_index:

df2 = df.set_index(['county','area']).stack().reset_index(name='pop')
df2[['tmp','year']] = df2.level_2.str.split('_', expand=True)
df2 = df2.drop(['level_2', 'tmp'],axis=1)
df2['year'] = df2.year.astype(int)
df2 = df2[['county','year','pop','area']]

print (df2)
   county  year   pop  area
0    1001  2006  1037   275
1    1001  2007  1052   275
2    1001  2008  1102   275
3    1003  2006  2399   394
4    1003  2007  2424   394
5    1003  2008  2438   394
6    1005  2006  1638   312
7    1005  2007  1647   312
8    1005  2008  1660   312

print (df2.dtypes)
county    int64
year      int32
pop       int64
area      int64
dtype: object

python pandas reshape data from long to wide

Group the dataframe by id and status columns, then take the first values for value column, finally unstack the resulting series:

>>> df.groupby(['id', 'status']).value.first().unstack().reset_index()

status  id item_ordered item_received item_setup
0        1     complete      complete   complete
1        2     complete           NaN   complete

Complicated (for me) reshaping from wide to long in Pandas

In fact, pandas has a wide_to_long command that can conveniently do what you intend to do.

df = pd.DataFrame( {'location' : ['A', 'A', 'A', 'B', 'B', 'B'], 
                'dist_to_A' : [0, 0, 0, 50, 50, 50], 
                'dist_to_B' : [50, 50, 50, 0, 0, 0], 
                'location_var': [10, 10, 10, 14, 14, 14], 
                'ind_var': [3, 8, 10, 1, 3, 4]})

df['ind'] = df.index

#The `location` and `location_var` corresponds to the choices, 
#record them as dictionaries and drop them 
#(Just realized you had a cleaner way, copied from yous). 

ind_to_loc = dict(df['location'])
loc_dict = dict(df.groupby('location').agg(lambda x : int(np.mean(x)))['location_var'])
df.drop(['location_var', 'location'], axis = 1, inplace = True)
# now reshape
df_long = pd.wide_to_long(df, ['dist_to_'], i = 'ind', j = 'location') 

# use the dictionaries to get variables `choice` and `location_var` back.

df_long['choice'] = df_long.index.map(lambda x: ind_to_loc[x[0]])
df_long['location_var'] = df_long.index.map(lambda x : loc_dict[x[1]])
print df_long.sort()

This gives you the table you asked for:

              ind_var  dist_to_ choice  location_var
ind location                                        
0   A               3         0      A            10
    B               3        50      A            14
1   A               8         0      A            10
    B               8        50      A            14
2   A              10         0      A            10
    B              10        50      A            14
3   A               1        50      B            10
    B               1         0      B            14
4   A               3        50      B            10
    B               3         0      B            14
5   A               4        50      B            10
    B               4         0      B            14

Of course you can generate a choice variable that takes 0 and 1 if that's what you want.

Convert dataframe from wide to long - pandas

There is pandas.wide_to_long, which is nice when the columns have stubs like that.

import pandas as pd

df.reset_index(inplace=True,drop=True)
df['id'] = df.index
df = pd.wide_to_long(df, stubnames='S_', i='id', j='num').reset_index().rename(columns={'S_':'S'})

#  id num  index    S
#0   0   1      0  1.0
#1   1   1      1  1.0
#2   0   2      0  0.0
#3   1   2      1  1.0
#4   0   3      0  0.0
#5   1   3      1  NaN
#6   0   4      0  1.0
#7   1   4      1  NaN

Reshaping long format dataframe to wide format according to the value of the elements in columns

Reshape the dataframe using pivot then subtract 5 from all the values and add prefix of n to column names:

df.pivot('group', 'ID', 'rank').rsub(5).add_prefix('n')

ID      n1   n2   n3   n4
group                    
1      3.0  2.0  4.0  1.0
2      3.0  4.0  NaN  NaN
3      4.0  1.0  2.0  3.0
4      4.0  NaN  NaN  NaN
5      2.0  3.0  4.0  NaN
6      4.0  NaN  NaN  NaN

Reshape Wide to Long in Pandas