How to Melt a Pandas Dataframe

How do I melt a pandas dataframe?

Note for users with pandas version under < 0.20.0, I will be using `df.melt(...)` for my examples, but your version would be too low for `df.melt`, you would need to use `pd.melt(df, ...)` instead.

Documentation references:

Most of the solutions here would be used with melt, so to know the method melt, see the documentaion explanation

Unpivot a DataFrame from wide to long format, optionally leaving
identifiers set.
This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (id_vars), while all other
columns, considered measured variables (value_vars), are “unpivoted”
to the row axis, leaving just two non-identifier columns, ‘variable’
and ‘value’.

And the parameters are:

Parameters
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name : scalar
Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
value_name : scalar, default ‘value’
Name to use for the ‘value’ column.
col_level : int or str, optional
If columns are a MultiIndex then use this level to melt.
ignore_index : bool, default True
If True, original index is ignored. If False, the original index is retained. Index labels will be repeated
as necessary.
New in version 1.1.0.

Logic to melting:

Melting merges multiple columns and converts the dataframe from wide to long, for the solution to Problem 1 (see below), the steps are:

First we got the original dataframe.
Then the melt firstly merges the Math and English columns and makes the dataframe replicated (longer).
Then finally adds the column Subject which is the subject of the Grades columns value respectively.

Sample Image

This is the simple logic to what the melt function does.

Solutions:

I will solve my own questions.

Problem 1:

Problem 1 could be solve using pd.DataFrame.melt with the following code:

print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))

This code passes the id_vars argument to ['Name', 'Age'], then automatically the value_vars would be set to the other columns (['Math', 'English']), which is transposed into that format.

You could also solve Problem 1 using stack like the below:

print(
    df.set_index(["Name", "Age"])
    .stack()
    .reset_index(name="Grade")
    .rename(columns={"level_2": "Subject"})
    .sort_values("Subject")
    .reset_index(drop=True)
)

This code sets the Name and Age columns as the index and stacks the rest of the columns Math and English, and resets the index and assigns Grade as the column name, then renames the other column level_2 to Subject and then sorts by the Subject column, then finally resets the index again.

Both of these solutions output:

    Name  Age  Subject Grade
0    Bob   13  English     C
1   John   16  English     B
2    Foo   16  English     B
3    Bar   15  English    A+
4   Alex   17  English     F
5    Tom   12  English     A
6    Bob   13     Math    A+
7   John   16     Math     B
8    Foo   16     Math     A
9    Bar   15     Math     F
10  Alex   17     Math     D
11   Tom   12     Math     C

Problem 2:

This is similar to my first question, but this one I only one to filter in the Math columns, this time the value_vars argument can come into use, like the below:

print(
    df.melt(
        id_vars=["Name", "Age"],
        value_vars="Math",
        var_name="Subject",
        value_name="Grades",
    )
)

Or we can also use stack with column specification:

print(
    df.set_index(["Name", "Age"])[["Math"]]
    .stack()
    .reset_index(name="Grade")
    .rename(columns={"level_2": "Subject"})
    .sort_values("Subject")
    .reset_index(drop=True)
)

Both of these solutions give:

   Name  Age Subject Grade
0   Bob   13    Math    A+
1  John   16    Math     B
2   Foo   16    Math     A
3   Bar   15    Math     F
4  Alex   15    Math     D
5   Tom   13    Math     C

Problem 3:

Problem 3 could be solved with melt and groupby, using the agg function with ', '.join, like the below:

print(
    df.melt(id_vars=["Name", "Age"])
    .groupby("value", as_index=False)
    .agg(", ".join)
)

It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.

stack could be also used to solve this problem, with stack and groupby like the below:

print(
    df.set_index(["Name", "Age"])
    .stack()
    .reset_index()
    .rename(columns={"level_2": "Subjects", 0: "Grade"})
    .groupby("Grade", as_index=False)
    .agg(", ".join)
)

This stack function just transposes the dataframe in a way that is equivalent to melt, then resets the index, renames the columns and groups and aggregates.

Both solutions output:

  Grade             Name                Subjects
0     A         Foo, Tom           Math, English
1    A+         Bob, Bar           Math, English
2     B  John, John, Foo  Math, English, English
3     C         Bob, Tom           English, Math
4     D             Alex                    Math
5     F        Bar, Alex           Math, English

Problem 4:

We first melt the dataframe for the input data:

df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')

Then now we can start solving this Problem 4.

Problem 4 could be solved with pivot_table, we would have to specify to the pivot_table arguments, values, index, columns and also aggfunc.

We could solve it with the below code:

print(
    df.pivot_table("Grades", ["Name", "Age"], "Subject", aggfunc="first")
    .reset_index()
    .rename_axis(columns=None)
)

Output:

   Name  Age English Math
0  Alex   15       F    D
1   Bar   15      A+    F
2   Bob   13       C   A+
3   Foo   16       B    A
4  John   16       B    B
5   Tom   13       A    C

The melted dataframe is converted back to the exact same format as the original dataframe.

We first pivot the melted dataframe and then reset the index and remove the column axis name.

Problem 5:

Problem 5 could be solved with melt and groupby like the following:

print(
    df.melt(id_vars=["Name", "Age"], var_name="Subject", value_name="Grades")
    .groupby("Name", as_index=False)
    .agg(", ".join)
)

That melts and groups by Name.

Or you could stack:

print(
    df.set_index(["Name", "Age"])
    .stack()
    .reset_index()
    .groupby("Name", as_index=False)
    .agg(", ".join)
    .rename({"level_2": "Subjects", 0: "Grades"}, axis=1)
)

Both codes output:

   Name       Subjects Grades
0  Alex  Math, English   D, F
1   Bar  Math, English  F, A+
2   Bob  Math, English  A+, C
3   Foo  Math, English   A, B
4  John  Math, English   B, B
5   Tom  Math, English   C, A

Problem 6:

Problem 6 could be solved with melt and no column needed to be specified, just specify the expected column names:

print(df.melt(var_name='Column', value_name='Value'))

That melts the whole dataframe

Or you could stack:

print(
    df.stack()
    .reset_index(level=1)
    .sort_values("level_1")
    .reset_index(drop=True)
    .set_axis(["Column", "Value"], axis=1)
)

Both codes output:

     Column Value
0       Age    16
1       Age    15
2       Age    15
3       Age    16
4       Age    13
5       Age    13
6   English    A+
7   English     B
8   English     B
9   English     A
10  English     F
11  English     C
12     Math     C
13     Math    A+
14     Math     D
15     Math     B
16     Math     F
17     Math     A
18     Name  Alex
19     Name   Bar
20     Name   Tom
21     Name   Foo
22     Name  John
23     Name   Bob

Conclusion:

melt is a really handy function, often it's required, once you meet these types of problems, don't forget to try melt, it may well solve your problem.

Remember for users with pandas versions under < 0.20.0, you would have to use pd.melt(df, ...) instead of df.melt(...).

Melt pandas dataframe based on condition

Try making a column with the useful values first:

df['Value'] = df.apply(lambda x: x[x.UsefulCol], axis=1)

timestamp   ID    Col1    Col2    Col3    Col4    UsefulCol    Value
16/11/2021  1     0.2             0.1             Col3         0.1
17/11/2021  1     0.3             0.8             Col3         0.8
17/11/2021  2              10                     Col2         10
17/11/2021  3                     0.1     2       Col4         2

Then, you can drop the columns you wanted to melt:

df.drop(['Col1', 'Col2', 'Col3', 'Col4], axis=1, inplace=True)

timestamp   ID    UsefulCol    Value
16/11/2021  1     Col3         0.1
17/11/2021  1     Col3         0.8
17/11/2021  2     Col2         10
17/11/2021  3     Col4         2

Rename your columns if you need:

df.rename({'UsefulCol':'Col'}, axis=1, inplace=True)

df.columns = [timestamp', 'ID', 'Col', 'Value]

Melt on pandas dataframes with multi-level

Use melt:

out = (
  df.set_index(df.columns[0])
    .melt(var_name=['Animal', 'Size', 'Age'], value_name='Count', ignore_index=False)
    .rename_axis('Day')
    .reset_index()
)

Output:

>>> out
    Day Animal    Size    Age  Count
0     1   Cats   Small  young      1
1     2   Cats   Small  young      2
2     3   Cats   Small  young      3
3     1   Cats   Small    old      2
4     2   Cats   Small    old      4
5     3   Cats   Small    old      6
6     1   Cats     Big  young      1
7     2   Cats     Big  young      3
8     3   Cats     Big  young      6
9     1   Cats     Big    old      5
10    2   Cats     Big    old      6
11    3   Cats     Big    old      7
12    1   Dogs  Little  young      3
13    2   Dogs  Little  young      0
14    3   Dogs  Little  young      9
15    1   Dogs  Little    old      2
16    2   Dogs  Little    old      1
17    3   Dogs  Little    old      3
18    1   Dogs   Large  young      8
19    2   Dogs   Large  young      4
20    3   Dogs   Large  young      5
21    1   Dogs   Large    old      6
22    2   Dogs   Large    old      6
23    3   Dogs   Large    old      0

Using melt() in Pandas

Use pandas.wide_to_long function as shown below:

pd.wide_to_long(df, ['Weight', 'Height'], 'Name', 'grp', ' ', '\\w+').reset_index()

    Name     grp  Weight  Height
0   John  Before     200       6
1  Kelly  Before     175       5
2   John   After     195       7
3  Kelly   After     165       6

or you could also use pivot_longer from pyjanitor as follows:

import janitor
df.pivot_longer('Name', names_to = ['.value', 'grp'], names_sep = ' ')

    Name     grp  Weight  Height
0   John  Before     200       6
1  Kelly  Before     175       5
2   John   After     195       7
3  Kelly   After     165       6

How to melt the pd.dataframe with the column names as identifiers?

You can use melt like this:

df.melt(var_name='ticker', value_name='return')

Or, without melt, you can simply use stack, the rest is only formatting/renaming:

(df.rename_axis('ticker', axis=1)
   .stack()
   .rename('return')
   .reset_index(level=1)
   .reset_index(drop=True)
)

output:

   ticker  return
0     MCD     1.0
1    AAPL    -0.3
2    GOOG    -0.8
3    MSFT    -0.1
4     MCD     2.0
...
19   MSFT    -1.3

How do I melt a pandas with custom nam

You can do this with pd.wide_to_long and a little column naming cleanup first, then reshape:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '.+' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool level_3   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123  Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123    test  124.0  136.0  140.8  141.88  21.35

Update renaming 'level_3' above:

df = df.rename(columns={'Std':'testStd',
                        'TestP90':'testP90',
                        'TestP99':'testP99', 
                        'TestP50':'testP50'})
df_out = pd.wide_to_long(df, 
                         ['test','Widget'], 
                         ['device_type', 'version', 'pool'], 
                         'Measure', '', '.+' )\
            .rename_axis('Instrument', axis=1) #add this line to rename column header axis
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out

Output:

Measure device_type  version  pool Instrument   Mean    P50    P90     P99    Std
0            PNB0Q7  8108162   123     Widget    2.2    0.0    6.4    9.64   3.92
1            PNB0Q7  8108162   123       test  124.0  136.0  140.8  141.88  21.35

Melt dataframe based on condition

Use pd.melt instead. Factor in replacement of False with NaN and dropna() eventually.

pd.melt(df.replace(False, np.nan), id_vars=['key'],var_name = 'letter', value_name = 'Bool').dropna()



    key letter  Bool
0    1      a  True
1    2      a  True
5    3      b  True

Pandas Melt function for time series data

Instead of using id_vars, you should've used ignore_index=False (by default it is set to True). With ignore_index=True, pandas will not reset your index before unpivoting.

>>> df1 = df1.melt(var_name='FIPS', value_name='Cases', ignore_index=False)
>>> df1
               FIPS     Cases
date
2020-08-08  40025.0  0.000861
2020-08-09  40025.0  0.001147
2020-08-10  40025.0  0.001431
2020-08-08  21201.0  0.001292
2020-08-09  21201.0  0.001290
2020-08-10  21201.0  0.001288
2020-08-08  30061.0  0.000287
2020-08-09  30061.0  0.000344
2020-08-10  30061.0  0.000401
2020-08-08  46021.0  0.001177
2020-08-09  46021.0  0.001204
2020-08-10  46021.0  0.001231

How to Melt a Pandas Dataframe

How do I melt a pandas dataframe?

Note for users with pandas version under < 0.20.0, I will be using `df.melt(...)` for my examples, but your version would be too low for `df.melt`, you would need to use `pd.melt(df, ...)` instead.

Documentation references:

Logic to melting:

Solutions:

Problem 1:

Problem 2:

Problem 3:

Problem 4:

Problem 5:

Problem 6:

Conclusion:

Melt pandas dataframe based on condition

Melt on pandas dataframes with multi-level

Using melt() in Pandas

How to melt the pd.dataframe with the column names as identifiers?

How do I melt a pandas with custom nam

Melt dataframe based on condition

Pandas Melt function for time series data

Related Topics

Leave a reply

How do I melt a pandas dataframe?

Note for users with pandas version under < 0.20.0, I will be using df.melt(...) for my examples, but your version would be too low for df.melt, you would need to use pd.melt(df, ...) instead.

Documentation references:

Logic to melting:

Solutions:

Problem 1:

Problem 2:

Problem 3:

Problem 4:

Problem 5:

Problem 6:

Conclusion:

Melt pandas dataframe based on condition

Melt on pandas dataframes with multi-level

Using melt() in Pandas

How to melt the pd.dataframe with the column names as identifiers?

How do I melt a pandas with custom nam

Melt dataframe based on condition

Pandas Melt function for time series data

Related Topics

Leave a reply

Note for users with pandas version under < 0.20.0, I will be using `df.melt(...)` for my examples, but your version would be too low for `df.melt`, you would need to use `pd.melt(df, ...)` instead.