How to Flatten a Hierarchical Index in Columns

How do I flatten a hierarchical column index in a pandas DataFrame?

set_axis

df.set_axis([f"{x}{y}" for x, y in df.columns], axis=1, inplace=False)

Aa Ab Ba Bb
0 0 1 2 3
1 4 5 6 7

index.map

df.columns = df.columns.map(''.join)
df

Aa Ab Ba Bb
0 0 1 2 3
1 4 5 6 7

For non-string column values

df.columns = df.columns.map(lambda x: ''.join([*map(str, x)]))
df

Aa Ab Ba Bb
0 0 1 2 3
1 4 5 6 7

Pandas flatten Hierarchical Multi-index

Symbols is your dataframes index, you'll need to use reset_index to put it into the frame itself. Try this:

df = (pd.DataFrame(web.DataReader(stocks, 'yahoo', day, day)
.iloc[0])
.unstack(level=0)
.droplevel(level=0, axis=1)
.rename_axis(columns=None) # Gets rid of the "Attributes"
.reset_index() # Puts "Symbols" as an actual column, not as the index
)

My 2 addition:

  • rename_axis This should get rid of your "Attributes" title. This is mainly for visual purposes when printing, but can throw off people who aren't used to working with multiindex data. Essentially your column labels are stored in an Index object. This Index object can have a name, so "Attributes" is the name of your columns (pretty strange concept, that isn't super useful for normal Indexes- but has a lot of usefulness when working with a MultiIndex).
  • reset_index() It seems that your "Symbols" column isn't actually a column (which is why it doesn't appear in df.columns but rather the index for the dataframe. Adding this method will insert the "Symbols" index as a column into the dataframe, and create a new index that is a simple RangeIndex that spans from 0 to the length of your dataframe.

concise way of flattening multiindex columns

You can do a map join with columns

out.columns = out.columns.map('_'.join)
out
Out[23]:
B_mean B_std C_median
A
1 0.204825 0.169408 0.926347
2 0.362184 0.404272 0.224119
3 0.533502 0.380614 0.218105

For some reason (when the column contain int) I like this way better

out.columns.map('{0[0]}_{0[1]}'.format) 
Out[27]: Index(['B_mean', 'B_std', 'C_median'], dtype='object')

Pandas flatten hierarchical index on non overlapping columns

You are misinterpreting what you are seeing.

     A  B
id
101 3 x
102 5 y

Is not showing you a hierarchical column index. id is the name of the row index. In order to show you the name of the index, pandas is putting that space there for you.

The answer to your question depends on what you really want or need.

As the df is, you can dump it to a csv just the way you want:

print(df.to_csv(sep='\t'))

id A B
101 3 x
102 5 y

print(df.to_csv())

id,A,B
101,3,x
102,5,y

Or you can alter the df so that it displays the way you'd like

print(df.rename_axis(None)) 

A B
101 3 x
102 5 y

please do not do this!!!!

I'm putting it to demonstrate how to manipulate

I could also keep the index as it is but manipulate both column and row index names to print how you would like.

print(df.rename_axis(None).rename_axis('id', 1))

id A B
101 3 x
102 5 y

But this has named the columns' index id which makes no sense.

Flatten multi-index pandas dataframe where column names become values

Use stack and reset_index

In [1260]: ndf.stack().reset_index()
Out[1260]:
City Group Sales
0 Edmonton A 4
1 Edmonton B 0
2 Edmonton C 0
3 Montreal A 6
4 Montreal B 0
5 Montreal C 0
6 Toronto A 13
7 Toronto B 0
8 Toronto C 8
9 Vancouver A 0
10 Vancouver B 16
11 Vancouver C 0
12 Windsor A 0
13 Windsor B 0
14 Windsor C 1
15 Winnipeg A 0
16 Winnipeg B 3
17 Winnipeg C 0

Groupby Sum and Flatten Multi-Row Index DataFrame

It seems like you can just do a stack and value_counts:

index_cols = df.filter(like='Index ').columns
flattened_series = df[index_cols].stack().value_counts(sort=False)

Output:

>>> flattened_series
A 3
a1 2
C 2
a11 1
a12 1
a2 1
a21 1
B 1
b1 1
b11 1
c1 1
c11 1
c2 1
c21 1
dtype: int64

That just counts the values of the index columns; it doesn't actually sum the values of the Value column. Doing that would also be pretty simple:

flattened_summed_series = df.set_index('Value').stack().reset_index(level=0).groupby(0, sort=False)['Value'].sum().rename_axis(None)

Output:

>>> flattened_summed_series
A 3
a1 2
a11 1
a12 1
a2 1
a21 1
B 1
b1 1
b11 1
C 2
c1 1
c11 1
c2 1
c21 1
Name: Value, dtype: int64


Related Topics



Leave a reply



Submit