Pandas Melt Function

Pandas Melt Function

melt gets you part way there.

In [29]: m = pd.melt(df, id_vars=['Year'], var_name='Name')

This has everything except Group. To get that, we need to reshape d a bit as well.

In [30]: d2 = {}

In [31]: for k, v in d.items():
for item in v:
d2[item] = k
....:

In [32]: d2
Out[32]: {'Amy': 'A', 'Ben': 'B', 'Bob': 'B', 'Carl': 'C', 'Chris': 'C'}

In [34]: m['Group'] = m['Name'].map(d2)

In [35]: m
Out[35]:
Year Name value Group
0 2013 Amy 2 A
1 2014 Amy 9 A
2 2013 Bob 4 B
3 2014 Bob 2 B
4 2013 Carl 7 C
.. ... ... ... ...
7 2014 Chris 5 C
8 2013 Ben 1 B
9 2014 Ben 5 B
10 2013 Other 3 NaN
11 2014 Other 6 NaN

[12 rows x 4 columns]

And moving 'Other' from Name to Group

In [8]: mask = m['Name'] == 'Other'

In [9]: m.loc[mask, 'Name'] = ''

In [10]: m.loc[mask, 'Group'] = 'Other'

In [11]: m
Out[11]:
Year Name value Group
0 2013 Amy 2 A
1 2014 Amy 9 A
2 2013 Bob 4 B
3 2014 Bob 2 B
4 2013 Carl 7 C
.. ... ... ... ...
7 2014 Chris 5 C
8 2013 Ben 1 B
9 2014 Ben 5 B
10 2013 3 Other
11 2014 6 Other

[12 rows x 4 columns]

How do I melt a pandas dataframe?

Note for pandas versions < 0.20.0: I will be using df.melt(...) for my examples, but you will need to use pd.melt(df, ...) instead.

Documentation references:

Most of the solutions here would be used with melt, so to know the method melt, see the documentaion explanation

Unpivot a DataFrame from wide to long format, optionally leaving
identifiers set.

This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (id_vars), while all other
columns, considered measured variables (value_vars), are “unpivoted”
to the row axis, leaving just two non-identifier columns, ‘variable’
and ‘value’.

Parameters

  • id_vars : tuple, list, or ndarray, optional

    Column(s) to use as identifier variables.

  • value_vars : tuple, list, or ndarray, optional

    Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.

  • var_name : scalar

    Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.

  • value_name : scalar, default ‘value’

    Name to use for the ‘value’ column.

  • col_level : int or str, optional

    If columns are a MultiIndex then use this level to melt.

  • ignore_index : bool, default True

    If True, original index is ignored. If False, the original index is retained. Index labels will be repeated
    as necessary.

    New in version 1.1.0.

Logic to melting:

Melting merges multiple columns and converts the dataframe from wide to long, for the solution to Problem 1 (see below), the steps are:

  1. First we got the original dataframe.

  2. Then the melt firstly merges the Math and English columns and makes the dataframe replicated (longer).

  3. Then finally adds the column Subject which is the subject of the Grades columns value respectively.

Sample Image

This is the simple logic to what the melt function does.

Solutions:

I will solve my own questions.

Problem 1:

Problem 1 could be solve using pd.DataFrame.melt with the following code:

print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))

This code passes the id_vars argument to ['Name', 'Age'], then automatically the value_vars would be set to the other columns (['Math', 'English']), which is transposed into that format.

You could also solve Problem 1 using stack like the below:

print(
df.set_index(["Name", "Age"])
.stack()
.reset_index(name="Grade")
.rename(columns={"level_2": "Subject"})
.sort_values("Subject")
.reset_index(drop=True)
)

This code sets the Name and Age columns as the index and stacks the rest of the columns Math and English, and resets the index and assigns Grade as the column name, then renames the other column level_2 to Subject and then sorts by the Subject column, then finally resets the index again.

Both of these solutions output:

    Name  Age  Subject Grade
0 Bob 13 English C
1 John 16 English B
2 Foo 16 English B
3 Bar 15 English A+
4 Alex 17 English F
5 Tom 12 English A
6 Bob 13 Math A+
7 John 16 Math B
8 Foo 16 Math A
9 Bar 15 Math F
10 Alex 17 Math D
11 Tom 12 Math C

Problem 2:

This is similar to my first question, but this one I only one to filter in the Math columns, this time the value_vars argument can come into use, like the below:

print(
df.melt(
id_vars=["Name", "Age"],
value_vars="Math",
var_name="Subject",
value_name="Grades",
)
)

Or we can also use stack with column specification:

print(
df.set_index(["Name", "Age"])[["Math"]]
.stack()
.reset_index(name="Grade")
.rename(columns={"level_2": "Subject"})
.sort_values("Subject")
.reset_index(drop=True)
)

Both of these solutions give:

   Name  Age Subject Grade
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C

Problem 3:

Problem 3 could be solved with melt and groupby, using the agg function with ', '.join, like the below:

print(
df.melt(id_vars=["Name", "Age"])
.groupby("value", as_index=False)
.agg(", ".join)
)

It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.

stack could be also used to solve this problem, with stack and groupby like the below:

print(
df.set_index(["Name", "Age"])
.stack()
.reset_index()
.rename(columns={"level_2": "Subjects", 0: "Grade"})
.groupby("Grade", as_index=False)
.agg(", ".join)
)

This stack function just transposes the dataframe in a way that is equivalent to melt, then resets the index, renames the columns and groups and aggregates.

Both solutions output:

  Grade             Name                Subjects
0 A Foo, Tom Math, English
1 A+ Bob, Bar Math, English
2 B John, John, Foo Math, English, English
3 C Bob, Tom English, Math
4 D Alex Math
5 F Bar, Alex Math, English

Problem 4:

We first melt the dataframe for the input data:

df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')



Then now we can start solving this Problem 4.

Problem 4 could be solved with pivot_table, we would have to specify to the pivot_table arguments, values, index, columns and also aggfunc.

We could solve it with the below code:

print(
df.pivot_table("Grades", ["Name", "Age"], "Subject", aggfunc="first")
.reset_index()
.rename_axis(columns=None)
)

Output:

   Name  Age English Math
0 Alex 15 F D
1 Bar 15 A+ F
2 Bob 13 C A+
3 Foo 16 B A
4 John 16 B B
5 Tom 13 A C

The melted dataframe is converted back to the exact same format as the original dataframe.

We first pivot the melted dataframe and then reset the index and remove the column axis name.

Problem 5:

Problem 5 could be solved with melt and groupby like the following:

print(
df.melt(id_vars=["Name", "Age"], var_name="Subject", value_name="Grades")
.groupby("Name", as_index=False)
.agg(", ".join)
)

That melts and groups by Name.

Or you could stack:

print(
df.set_index(["Name", "Age"])
.stack()
.reset_index()
.groupby("Name", as_index=False)
.agg(", ".join)
.rename({"level_2": "Subjects", 0: "Grades"}, axis=1)
)

Both codes output:

   Name       Subjects Grades
0 Alex Math, English D, F
1 Bar Math, English F, A+
2 Bob Math, English A+, C
3 Foo Math, English A, B
4 John Math, English B, B
5 Tom Math, English C, A

Problem 6:

Problem 6 could be solved with melt and no column needed to be specified, just specify the expected column names:

print(df.melt(var_name='Column', value_name='Value'))

That melts the whole dataframe

Or you could stack:

print(
df.stack()
.reset_index(level=1)
.sort_values("level_1")
.reset_index(drop=True)
.set_axis(["Column", "Value"], axis=1)
)

Both codes output:

     Column Value
0 Age 16
1 Age 15
2 Age 15
3 Age 16
4 Age 13
5 Age 13
6 English A+
7 English B
8 English B
9 English A
10 English F
11 English C
12 Math C
13 Math A+
14 Math D
15 Math B
16 Math F
17 Math A
18 Name Alex
19 Name Bar
20 Name Tom
21 Name Foo
22 Name John
23 Name Bob

Conclusion:

melt is a really handy function, often it's required, once you meet these types of problems, don't forget to try melt, it may well solve your problem.

Pandas Reshape dataframe without using melt function

melt is designed for these operations, but an alternative would be to set your index on id and name, using set_index(), and use stack:

df.set_index(['id','name']).stack()\
.reset_index(name='val')\
.query('val == 1')\
.rename({'level_2':'language'},axis=1)\
.drop('val',axis=1)

prints:

   id    name language
0 1 Alex python
1 1 Alex java
2 1 Alex mysql
3 2 Herald python
5 2 Herald mysql
6 3 Jack python
9 4 Mike python

Pandas Melt function for time series data

Instead of using id_vars, you should've used ignore_index=False (by default it is set to True). With ignore_index=True, pandas will not reset your index before unpivoting.

>>> df1 = df1.melt(var_name='FIPS', value_name='Cases', ignore_index=False)
>>> df1
FIPS Cases
date
2020-08-08 40025.0 0.000861
2020-08-09 40025.0 0.001147
2020-08-10 40025.0 0.001431
2020-08-08 21201.0 0.001292
2020-08-09 21201.0 0.001290
2020-08-10 21201.0 0.001288
2020-08-08 30061.0 0.000287
2020-08-09 30061.0 0.000344
2020-08-10 30061.0 0.000401
2020-08-08 46021.0 0.001177
2020-08-09 46021.0 0.001204
2020-08-10 46021.0 0.001231

Melt dataframe based on condition

Use pd.melt instead. Factor in replacement of False with NaN and dropna() eventually.

pd.melt(df.replace(False, np.nan), id_vars=['key'],var_name = 'letter', value_name = 'Bool').dropna()

key letter Bool
0 1 a True
1 2 a True
5 3 b True

Using melt() in Pandas

Use pandas.wide_to_long function as shown below:

pd.wide_to_long(df, ['Weight', 'Height'], 'Name', 'grp', ' ', '\\w+').reset_index()

Name grp Weight Height
0 John Before 200 6
1 Kelly Before 175 5
2 John After 195 7
3 Kelly After 165 6

or you could also use pivot_longer from pyjanitor as follows:

import janitor
df.pivot_longer('Name', names_to = ['.value', 'grp'], names_sep = ' ')

Name grp Weight Height
0 John Before 200 6
1 Kelly Before 175 5
2 John After 195 7
3 Kelly After 165 6


Related Topics



Leave a reply



Submit