How do I flatten a hierarchical column index in a pandas DataFrame?
set_axis
df.set_axis([f"{x}{y}" for x, y in df.columns], axis=1, inplace=False)
Aa Ab Ba Bb
0 0 1 2 3
1 4 5 6 7
index.map
df.columns = df.columns.map(''.join)
df
Aa Ab Ba Bb
0 0 1 2 3
1 4 5 6 7
For non-string column values
df.columns = df.columns.map(lambda x: ''.join([*map(str, x)]))
df
Aa Ab Ba Bb
0 0 1 2 3
1 4 5 6 7
Pandas flatten Hierarchical Multi-index
Symbols is your dataframes index, you'll need to use reset_index
to put it into the frame itself. Try this:
df = (pd.DataFrame(web.DataReader(stocks, 'yahoo', day, day)
.iloc[0])
.unstack(level=0)
.droplevel(level=0, axis=1)
.rename_axis(columns=None) # Gets rid of the "Attributes"
.reset_index() # Puts "Symbols" as an actual column, not as the index
)
My 2 addition:
rename_axis
This should get rid of your "Attributes" title. This is mainly for visual purposes when printing, but can throw off people who aren't used to working with multiindex data. Essentially your column labels are stored in anIndex
object. ThisIndex
object can have a name, so "Attributes" is the name of your columns (pretty strange concept, that isn't super useful for normal Indexes- but has a lot of usefulness when working with aMultiIndex
).reset_index()
It seems that your "Symbols" column isn't actually a column (which is why it doesn't appear indf.columns
but rather the index for the dataframe. Adding this method will insert the "Symbols" index as a column into the dataframe, and create a new index that is a simpleRangeIndex
that spans from 0 to the length of your dataframe.
concise way of flattening multiindex columns
You can do a map
join
with columns
out.columns = out.columns.map('_'.join)
out
Out[23]:
B_mean B_std C_median
A
1 0.204825 0.169408 0.926347
2 0.362184 0.404272 0.224119
3 0.533502 0.380614 0.218105
For some reason (when the column contain int) I like this way better
out.columns.map('{0[0]}_{0[1]}'.format)
Out[27]: Index(['B_mean', 'B_std', 'C_median'], dtype='object')
Pandas flatten hierarchical index on non overlapping columns
You are misinterpreting what you are seeing.
A B
id
101 3 x
102 5 y
Is not showing you a hierarchical column index. id
is the name of the row index. In order to show you the name of the index, pandas is putting that space there for you.
The answer to your question depends on what you really want or need.
As the df
is, you can dump it to a csv
just the way you want:
print(df.to_csv(sep='\t'))
id A B
101 3 x
102 5 y
print(df.to_csv())
id,A,B
101,3,x
102,5,y
Or you can alter the df
so that it displays the way you'd like
print(df.rename_axis(None))
A B
101 3 x
102 5 y
please do not do this!!!!
I'm putting it to demonstrate how to manipulate
I could also keep the index as it is but manipulate both column and row index names to print how you would like.
print(df.rename_axis(None).rename_axis('id', 1))
id A B
101 3 x
102 5 y
But this has named the columns' index id
which makes no sense.
Flatten multi-index pandas dataframe where column names become values
Use stack
and reset_index
In [1260]: ndf.stack().reset_index()
Out[1260]:
City Group Sales
0 Edmonton A 4
1 Edmonton B 0
2 Edmonton C 0
3 Montreal A 6
4 Montreal B 0
5 Montreal C 0
6 Toronto A 13
7 Toronto B 0
8 Toronto C 8
9 Vancouver A 0
10 Vancouver B 16
11 Vancouver C 0
12 Windsor A 0
13 Windsor B 0
14 Windsor C 1
15 Winnipeg A 0
16 Winnipeg B 3
17 Winnipeg C 0
Groupby Sum and Flatten Multi-Row Index DataFrame
It seems like you can just do a stack
and value_counts
:
index_cols = df.filter(like='Index ').columns
flattened_series = df[index_cols].stack().value_counts(sort=False)
Output:
>>> flattened_series
A 3
a1 2
C 2
a11 1
a12 1
a2 1
a21 1
B 1
b1 1
b11 1
c1 1
c11 1
c2 1
c21 1
dtype: int64
That just counts the values of the index columns; it doesn't actually sum the values of the Value
column. Doing that would also be pretty simple:
flattened_summed_series = df.set_index('Value').stack().reset_index(level=0).groupby(0, sort=False)['Value'].sum().rename_axis(None)
Output:
>>> flattened_summed_series
A 3
a1 2
a11 1
a12 1
a2 1
a21 1
B 1
b1 1
b11 1
C 2
c1 1
c11 1
c2 1
c21 1
Name: Value, dtype: int64
Related Topics
How to Open Different Urls At the Same Time by Using Python Selenium
Using a Global Variable With a Thread
Wait Until a Certain Process (Knowing the "Pid") End
How to Check Whether All Elements of Array Are in Between Two Values
How to Read Pdf Files One by One from a Folder in Python
Setting Matplotlib Colorbar Range
How to Show a Pandas Dataframe into a Existing Flask HTML Table
Python: Requests.Exceptions.Connectionerror. Max Retries Exceeded With Url
How to Repeat Each Test Multiple Times in a Py.Test Run
How to Get All Days in Current Month
How to Move to One Folder Back in Python
How to Test If a List Contains Another List as a Contiguous Subsequence
Converting Text File into Json in a Specific Format ( Python )
How to Share Single Sqlite Connection in Multi-Threaded Python Application
How to Enable Autocomplete (Intellisense) for Python Package Modules
How to Count Duplicate Rows in Pandas Dataframe