Pandas long to wide reshape, by two variables
A simple pivot might be sufficient for your needs but this is what I did to reproduce your desired output:
df['idx'] = df.groupby('Salesman').cumcount()
Just adding a within group counter/index will get you most of the way there but the column labels will not be as you desired:
print df.pivot(index='Salesman',columns='idx')[['product','price']]
product price
idx 0 1 2 0 1 2
Salesman
Knut bat ball wand 5 1 3
Steve pen NaN NaN 2 NaN NaN
To get closer to your desired output I added the following:
df['prod_idx'] = 'product_' + df.idx.astype(str)
df['prc_idx'] = 'price_' + df.idx.astype(str)
product = df.pivot(index='Salesman',columns='prod_idx',values='product')
prc = df.pivot(index='Salesman',columns='prc_idx',values='price')
reshape = pd.concat([product,prc],axis=1)
reshape['Height'] = df.set_index('Salesman')['Height'].drop_duplicates()
print reshape
product_0 product_1 product_2 price_0 price_1 price_2 Height
Salesman
Knut bat ball wand 5 1 3 6
Steve pen NaN NaN 2 NaN NaN 5
Edit: if you want to generalize the procedure to more variables I think you could do something like the following (although it might not be efficient enough):
df['idx'] = df.groupby('Salesman').cumcount()
tmp = []
for var in ['product','price']:
df['tmp_idx'] = var + '_' + df.idx.astype(str)
tmp.append(df.pivot(index='Salesman',columns='tmp_idx',values=var))
reshape = pd.concat(tmp,axis=1)
@Luke said:
I think Stata can do something like this with the reshape command.
You can but I think you also need a within group counter to get the reshape in stata to get your desired output:
+-------------------------------------------+
| salesman idx height product price |
|-------------------------------------------|
1. | Knut 0 6 bat 5 |
2. | Knut 1 6 ball 1 |
3. | Knut 2 6 wand 3 |
4. | Steve 0 5 pen 2 |
+-------------------------------------------+
If you add idx
then you could do reshape in stata
:
reshape wide product price, i(salesman) j(idx)
Reshaping Long Data to Wide in Python (Pandas)
You can pivot your dataframe:
df.pivot(index='TICKER', columns='date', values='RET')
date 20050131 20050231
TICKER
AAPL 0.02 0.01
GOOG 0.05 0.03
Pandas long to wide, WHILE preserving existing columns?
You were actually almost all the way there with pivot()
. Specifying the index
will take you almost all the way there:
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
df = frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales",
)
df
The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
df = (
frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales"
)
.reset_index() # collapses multi-index
.rename_axis(None, axis=1) # renames index
)
df
Reshape pandas dataframe wide to long, with some variables to stack, other variables to repeat
>>> id['id'] = df.index
>>> pd.wide_to_long(df, ["wage", "hours"], i="id", j="year")
edu race wage hours
id year
0 1985 1 3 8 5
1 1985 1 4 7 5
2 1985 7 3 7 3
3 1985 0 1 8 8
4 1985 2 7 2 1
5 1985 6 1 8 3
6 1985 7 1 4 6
7 1985 8 6 0 6
8 1985 7 0 8 4
9 1985 6 3 2 7
0 1986 1 3 2 0
1 1986 1 4 1 8
2 1986 7 3 5 7
3 1986 0 1 3 3
4 1986 2 7 7 6
5 1986 6 1 1 8
6 1986 7 1 7 6
7 1986 8 6 6 2
8 1986 7 0 7 2
9 1986 6 3 2 2
Pandas long reshape for several variables
You could try to pivot your dataframe, after building a custom index per session:
df2 = df.assign(index=df.groupby(['Session']).cumcount()).pivot(
'index', 'Session', ['Tube', 'Window', 'Counts', 'Length']).rename_axis(index=None)
With you sample data it would give:
Tube Window Counts Length
Session 1 10 1 10 1 10 1 10
0 1.0 53.0 1.0 36.0 0.0 0.0 0.0 0.0
1 1.0 53.0 2.0 37.0 0.0 0.0 0.0 0.0
2 1.0 53.0 3.0 38.0 0.0 0.0 0.0 0.0
3 1.0 53.0 4.0 39.0 0.0 0.0 0.0 0.0
4 1.0 53.0 5.0 40.0 0.0 0.0 0.0 0.0
Not that bad but we have a MultiIndex for the columns and in a wrong order. Let us go further:
df2.columns = df2.columns.to_flat_index()
df2 = df2.reindex(columns=sorted(df2.columns, key=lambda x: x[1]))
We now have:
(Tube, 1) (Window, 1) ... (Counts, 10) (Length, 10)
0 1.0 1.0 ... 0.0 0.0
1 1.0 2.0 ... 0.0 0.0
2 1.0 3.0 ... 0.0 0.0
3 1.0 4.0 ... 0.0 0.0
4 1.0 5.0 ... 0.0 0.0
Last step:
df2 = df2.rename(columns=lambda x: '_'.join(str(i) for i in x))
to finaly get:
Tube_1 Window_1 Counts_1 ... Window_10 Counts_10 Length_10
0 1.0 1.0 0.0 ... 36.0 0.0 0.0
1 1.0 2.0 0.0 ... 37.0 0.0 0.0
2 1.0 3.0 0.0 ... 38.0 0.0 0.0
3 1.0 4.0 0.0 ... 39.0 0.0 0.0
4 1.0 5.0 0.0 ... 40.0 0.0 0.0
How to reshape dataframe, wide to long, for several variables in one or several calls?
You can use wide_to_long
; the issue is just that your column names need to be changed a bit, so that the stubnames are ['pct', 'valid', 'value']
, and not t#
.
import pandas as pd
import numpy as np
# Reverse order of words around '_'
df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns]
# Add prefix for other stubs
df = df.rename(columns= dict((f't{i}', f'value_t{i}') for i in np.arange(1,4,1)))
pd.wide_to_long(df, stubnames=['pct', 'valid', 'value'],
i='id', j='test', suffix='.*', sep='_').reset_index()
Output:
id test pct valid value
0 66602088802 t1 0.46 True car
1 85002620928 t1 0.51 True house
2 66602088802 t2 0.15 True bike
3 85002620928 t2 0.07 True car
4 66602088802 t3 0.06 False car
5 85002620928 t3 0.07 False toy
Related Topics
How to Convert Each Item in the List to String, for the Purpose of Joining Them
Non-Alphanumeric List Order from Os.Listdir()
Importerror: Dll Load Failed: %1 Is Not a Valid Win32 Application. But the Dll's Are There
Pandas: Drop Consecutive Duplicates
Why Python 3.6.1 Throws Attributeerror: Module 'Enum' Has No Attribute 'Intflag'
Is It Pythonic: Naming Lambdas
Pandas Read_Csv: Low_Memory and Dtype Options
Python List VS. Array - When to Use
How to Create a Guid/Uuid in Python
How to Convert SQLalchemy Row Object to a Python Dict
How to Convert a Utc Datetime to a Local Datetime Using Only Standard Library
String Concatenation of Two Pandas Columns
How to Get Monitor Resolution in Python
What Is the Intended Use of the Optional "Else" Clause of the "Try" Statement in Python