Rearrange dataframe to a table, the opposite of melt
> dcast(mydf, SN ~ class)
SN A1 B2 C3 D4
1 1 0.1461258 0.8325014 0.33562088 -0.07294576
2 2 0.5964182 0.4593710 -0.23652803 -1.52539568
3 3 2.0247742 -1.1235963 1.79875447 -1.87462227
4 4 0.8184004 1.3486721 0.76076486 -1.18311991
5 5 -0.6577212 0.3666741 -0.06057506 1.38825487
6 6 0.1590443 0.2043661 0.08161778 0.10421797
...
Opposite of melt in python pandas
there are a few ways;
using .pivot
:
>>> origin.pivot(index='label', columns='type')['value']
type a b c
label
x 1 2 3
y 4 5 6
z 7 8 9
[3 rows x 3 columns]
using pivot_table
:
>>> origin.pivot_table(values='value', index='label', columns='type')
value
type a b c
label
x 1 2 3
y 4 5 6
z 7 8 9
[3 rows x 3 columns]
or .groupby
followed by .unstack
:
>>> origin.groupby(['label', 'type'])['value'].aggregate('mean').unstack()
type a b c
label
x 1 2 3
y 4 5 6
z 7 8 9
[3 rows x 3 columns]
Reshaping a data frame by reverse melt
You need to use the walking
-variable as your value.var
:
dcast(df, originId ~ hour, value.var = 'walking')
which gives:
originId 830 930
1 229873828 3.519326 3.523732
2 359727104 3.440248 3.029167
3 359727113 3.178191 2.768297
4 359931904 8.065233 8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993
And probably even better:
dcast(df, originId ~ paste0('hr_',hour), value.var = 'walking')
which gives:
originId hr_830 hr_930
1 229873828 3.519326 3.523732
2 359727104 3.440248 3.029167
3 359727113 3.178191 2.768297
4 359931904 8.065233 8.093116
5 359931908 20.758961 21.234964
6 359931909 15.050358 15.701993
How to 'reverse melt' a data.frame?
You can create a frequency table with table()
and transform the result into a data.frame.
x <- table(df1$sample, df1$LETTER)
df2 <- cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))
sample P Q R S T U V W X Y Z
a a 1 0 1 0 0 0 1 0 0 1 0
b b 0 1 0 0 0 0 0 0 0 0 0
c c 0 1 1 1 1 1 0 1 1 0 1
d d 0 1 0 0 0 0 0 0 1 0 0
e e 0 1 0 0 0 0 1 0 1 0 0
g g 0 1 0 0 1 0 0 0 0 0 0
If you want to include sample = f (not present in df1) in the output you could add the missing value as a factor level to df$sample
before calling table()
:
df1$sample <- factor(df1$sample, levels = letters[1:7])
x <- table(df1$sample2, df1$LETTER)
cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))
sample P Q R S T U V W X Y Z
a a 1 0 1 0 0 0 1 0 0 1 0
b b 0 1 0 0 0 0 0 0 0 0 0
c c 0 1 1 1 1 1 0 1 1 0 1
d d 0 1 0 0 0 0 0 0 1 0 0
e e 0 1 0 0 0 0 1 0 1 0 0
f f 0 0 0 0 0 0 0 0 0 0 0
g g 0 1 0 0 1 0 0 0 0 0 0
Reverse of melt and drop duplicates dataframe
Use Series.where
for repalce non Table
rows to NaN
s and forward fill missing values, then filtere out these rows with inverted mask by ~
in boolean indexing
, rename
column and change ordering of columns:
m = df['type'].eq('Table')
df['Table'] = df['entity'].where(m).ffill()
df = df[~m].rename(columns={'entity':'Cell'})[['event','Table','Cell']]
print (df)
event Table Cell
1 1 T1 C1
2 1 T1 C2
4 2 T2 C3
Un-melt Dataframe and keep rest of columns? Python Pandas
A combination of set_index, crosstab and reindex can 'unmelt' the dataframe, and take care of the null values present in the dataframe:
#set aside required multiindex of country, state, restaurant, stars, and reviews
ind = df.set_index(['COUNTRY','STATE','RESTAURANT','STARS','REVIEWS']).index
#get frequency count for Pizza and Spicy
res = pd.crosstab([df.COUNTRY,df.STATE,df.RESTAURANT,df.STARS,df.REVIEWS],df.CATEGORY)
#reindex frequency dataframe with ind
res = res.reindex(ind,fill_value=0).drop_duplicates()
res
CATEGORY Pizza Spicy
COUNTRY STATE RESTAURANT STARS REVIEWS
US Texas Texas Chicken 4.1 1,157 0 1
Ohio Mamas Shop 3.6 700 0 0
Pizza Hut 4.5 855 1 0
How to reshape data from long to wide format
Using reshape
function:
reshape(dat1, idvar = "name", timevar = "numbers", direction = "wide")
Use reshape2 to reorder the first columns of a data frame
No need for reshaping.
Just change the order of your columns, and then arrange..
library( tidyverse )
df %>%
select( mu, sigma, everything() ) %>%
arrange( mu, sigma, n )
# mu sigma n Mean Variance
# 1 0 1 100 0.0068 0.9923
# 2 0 1 1000 0.0005 1.0042
# 3 0 2 100 0.0281 3.9589
# 4 0 2 1000 0.0041 3.9920
# 5 5 1 100 4.9983 1.0055
# 6 5 1 1000 4.9886 1.0004
# 7 5 2 100 4.9850 3.9318
# 8 5 2 1000 5.0019 3.9582
pandas, melt, unmelt preserve index
You need preserve index values by reset_index
and parameter id_vars
:
df2 = pd.melt(df.reset_index(), id_vars='index',value_vars=['asset1','asset2'])
print (df2)
index variable value
0 coper1 asset1 1
1 coper2 asset1 3
2 coper3 asset1 5
3 coper1 asset2 2
4 coper2 asset2 4
5 coper3 asset2 6
Then pivot working nice:
print(df2.pivot(index='index',columns = 'variable', values = 'value'))
variable asset1 asset2
index
coper1 1 2
coper2 3 4
coper3 5 6
Another possible solution with stack
:
df2 = df.stack().reset_index()
df2.columns = list('abc')
print (df2)
a b c
0 coper1 asset1 1
1 coper1 asset2 2
2 coper2 asset1 3
3 coper2 asset2 4
4 coper3 asset1 5
5 coper3 asset2 6
print(df2.pivot(index='a',columns = 'b', values = 'c'))
b asset1 asset2
a
coper1 1 2
coper2 3 4
coper3 5 6
How do I melt a pandas dataframe?
Note for pandas versions < 0.20.0: I will be using df.melt(...)
for my examples, but you will need to use pd.melt(df, ...)
instead.
Documentation references:
Most of the solutions here would be used with melt
, so to know the method melt
, see the documentaion explanation
Unpivot a DataFrame from wide to long format, optionally leaving
identifiers set.This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (id_vars), while all other
columns, considered measured variables (value_vars), are “unpivoted”
to the row axis, leaving just two non-identifier columns, ‘variable’
and ‘value’.Parameters
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that are not set as id_vars.
var_name : scalar
Name to use for the ‘variable’ column. If None it uses frame.columns.name or ‘variable’.
value_name : scalar, default ‘value’
Name to use for the ‘value’ column.
col_level : int or str, optional
If columns are a MultiIndex then use this level to melt.
ignore_index : bool, default True
If True, original index is ignored. If False, the original index is retained. Index labels will be repeated
as necessary.New in version 1.1.0.
Logic to melting:
Melting merges multiple columns and converts the dataframe from wide to long, for the solution to Problem 1 (see below), the steps are:
First we got the original dataframe.
Then the melt firstly merges the
Math
andEnglish
columns and makes the dataframe replicated (longer).Then finally adds the column
Subject
which is the subject of theGrades
columns value respectively.
This is the simple logic to what the melt
function does.
Solutions:
I will solve my own questions.
Problem 1:
Problem 1 could be solve using pd.DataFrame.melt
with the following code:
print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))
This code passes the id_vars
argument to ['Name', 'Age']
, then automatically the value_vars
would be set to the other columns (['Math', 'English']
), which is transposed into that format.
You could also solve Problem 1 using stack
like the below:
print(
df.set_index(["Name", "Age"])
.stack()
.reset_index(name="Grade")
.rename(columns={"level_2": "Subject"})
.sort_values("Subject")
.reset_index(drop=True)
)
This code sets the Name
and Age
columns as the index and stacks the rest of the columns Math
and English
, and resets the index and assigns Grade
as the column name, then renames the other column level_2
to Subject
and then sorts by the Subject
column, then finally resets the index again.
Both of these solutions output:
Name Age Subject Grade
0 Bob 13 English C
1 John 16 English B
2 Foo 16 English B
3 Bar 15 English A+
4 Alex 17 English F
5 Tom 12 English A
6 Bob 13 Math A+
7 John 16 Math B
8 Foo 16 Math A
9 Bar 15 Math F
10 Alex 17 Math D
11 Tom 12 Math C
Problem 2:
This is similar to my first question, but this one I only one to filter in the Math
columns, this time the value_vars
argument can come into use, like the below:
print(
df.melt(
id_vars=["Name", "Age"],
value_vars="Math",
var_name="Subject",
value_name="Grades",
)
)
Or we can also use stack
with column specification:
print(
df.set_index(["Name", "Age"])[["Math"]]
.stack()
.reset_index(name="Grade")
.rename(columns={"level_2": "Subject"})
.sort_values("Subject")
.reset_index(drop=True)
)
Both of these solutions give:
Name Age Subject Grade
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C
Problem 3:
Problem 3 could be solved with melt
and groupby
, using the agg
function with ', '.join
, like the below:
print(
df.melt(id_vars=["Name", "Age"])
.groupby("value", as_index=False)
.agg(", ".join)
)
It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.
stack
could be also used to solve this problem, with stack
and groupby
like the below:
print(
df.set_index(["Name", "Age"])
.stack()
.reset_index()
.rename(columns={"level_2": "Subjects", 0: "Grade"})
.groupby("Grade", as_index=False)
.agg(", ".join)
)
This stack
function just transposes the dataframe in a way that is equivalent to melt
, then resets the index, renames the columns and groups and aggregates.
Both solutions output:
Grade Name Subjects
0 A Foo, Tom Math, English
1 A+ Bob, Bar Math, English
2 B John, John, Foo Math, English, English
3 C Bob, Tom English, Math
4 D Alex Math
5 F Bar, Alex Math, English
Problem 4:
We first melt the dataframe for the input data:
df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')
Then now we can start solving this Problem 4.
Problem 4 could be solved with pivot_table
, we would have to specify to the pivot_table
arguments, values
, index
, columns
and also aggfunc
.
We could solve it with the below code:
print(
df.pivot_table("Grades", ["Name", "Age"], "Subject", aggfunc="first")
.reset_index()
.rename_axis(columns=None)
)
Output:
Name Age English Math
0 Alex 15 F D
1 Bar 15 A+ F
2 Bob 13 C A+
3 Foo 16 B A
4 John 16 B B
5 Tom 13 A C
The melted dataframe is converted back to the exact same format as the original dataframe.
We first pivot the melted dataframe and then reset the index and remove the column axis name.
Problem 5:
Problem 5 could be solved with melt
and groupby
like the following:
print(
df.melt(id_vars=["Name", "Age"], var_name="Subject", value_name="Grades")
.groupby("Name", as_index=False)
.agg(", ".join)
)
That melts and groups by Name
.
Or you could stack
:
print(
df.set_index(["Name", "Age"])
.stack()
.reset_index()
.groupby("Name", as_index=False)
.agg(", ".join)
.rename({"level_2": "Subjects", 0: "Grades"}, axis=1)
)
Both codes output:
Name Subjects Grades
0 Alex Math, English D, F
1 Bar Math, English F, A+
2 Bob Math, English A+, C
3 Foo Math, English A, B
4 John Math, English B, B
5 Tom Math, English C, A
Problem 6:
Problem 6 could be solved with melt
and no column needed to be specified, just specify the expected column names:
print(df.melt(var_name='Column', value_name='Value'))
That melts the whole dataframe
Or you could stack
:
print(
df.stack()
.reset_index(level=1)
.sort_values("level_1")
.reset_index(drop=True)
.set_axis(["Column", "Value"], axis=1)
)
Both codes output:
Column Value
0 Age 16
1 Age 15
2 Age 15
3 Age 16
4 Age 13
5 Age 13
6 English A+
7 English B
8 English B
9 English A
10 English F
11 English C
12 Math C
13 Math A+
14 Math D
15 Math B
16 Math F
17 Math A
18 Name Alex
19 Name Bar
20 Name Tom
21 Name Foo
22 Name John
23 Name Bob
Conclusion:
melt
is a really handy function, often it's required, once you meet these types of problems, don't forget to try melt
, it may well solve your problem.
Related Topics
Which Library Could Be Used to Make a Chord Diagram in R
Clustering List for Hclust Function
What Is a Neat Command Line Equivalent to Rstudio's Knit HTML
How to Syntax Highlight Inline R Code in R Markdown
Dplyr 'Rename' Standard Evaluation Function Not Working as Expected
Getting a Map with Points, Using Ggmap and Ggplot2
Search Within a String That Does Not Contain a Pattern
How to Recreate Same Documenttermmatrix with New (Test) Data
Recode Categorical Factor with N Categories into N Binary Columns
Increase Legend Font Size Ggplot2
How to Get Rstudio to Automatically Compile R Markdown Vignettes
How to Build a Dendrogram from a Directory Tree
Label Minimum and Maximum of Scale Fill Gradient Legend with Text: Ggplot2
How to Get Currency Exchange Rates in R
Reading Objects from Shiny Output Object Not Allowed
Replace Two Dots in a String with Gsub
What Is the Correct/Standard Way to Check If Difference Is Smaller Than MAChine Precision