Reshaping data.frame from wide to long format
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
reshape(d,
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)
Reshape data table from wide to long with transpose
A better approach is to use the new pivot_londer and pivot_wider functions from the tidyr package.
Easier convention to use and has convenient text manipulation options built in. In this case removing the "X." that was added to column names.
df <- read.table(header=TRUE, text="Mill Acid `1_day` `3_days` `1_week` `2_weeks` `4_weeks` `2_months` `3_months` `6-7_months`
Gävle 0 10.5 12.0 10.9 10.7 10.6 10.1 10 9.81
Gävle 0.5 8.79 10 9.29 9.08 9.39 9.13 9.14 8.86
Gävle 0.75 8.05 8.95 8.33 8.26 8.24 8.22 8.25 7.44
Gävle 1 6.7 7.82 7.77 8.02 8.19 7.79 7.97 6.99
Gävle 1.25 6.52 7.43 7.33 7.11 7.72 7.88 7.91 6.96
Gävle 1.5 6.41 7.25 7.28 6.92 7.63 7.01 7.64 6.7
Obbola 0 10.5 12.0 10.9 10.7 10.6 10.1 10 9.81
Obbola 0.5 8.79 10 9.29 9.08 9.39 9.13 9.14 8.86
Obbola 0.75 8.05 8.95 8.33 8.26 8.24 8.22 8.25 7.44
Obbola 1 6.7 7.82 7.77 8.02 8.19 7.79 7.97 6.99
Obbola 1.25 6.52 7.43 7.33 7.11 7.72 7.88 7.91 6.96
Obbola 1.5 6.41 7.25 7.28 6.92 7.63 7.01 7.64 6.7 ")
library(tidyr)
longdf <- df %>% pivot_longer(-c("Mill", "Acid"), names_to="Time", values_to = "value", names_prefix="X.")
answer <-longdf %>% pivot_wider(id_cols= c("Time", "Acid" ), names_from = "Mill" )
reshape dataframe from wide to long in R
Using data.table
:
library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'),
measure.vars = patterns(fixed='fixed_', current='current_'),
variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
## id name year fixed current
## 1: 1 A 2020 2300 3000
## 2: 2 A 2019 2100 3100
## 3: 3 A 2018 2600 3200
## 4: 4 A 2017 2600 3300
## 5: 5 A 2016 1900 3400
This should be very fast but your data-set is not very big tbh.
Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020
as the first fixed_*
column, there is also a current_2020
as the first current_*
column, and so on. Otherwise, the year
column will correctly associate with fixed
but not current
Converting data from wide to long (using multiple columns)
You can use the base reshape()
function to (roughly) simultaneously melt over multiple sets of variables, by using the varying
parameter and setting direction
to "long"
.
For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying
argument:
dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765
", header=TRUE)
reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"))
You'll end up with this:
cid dyad junk time f op ed id
1.1 1 2 0.876 1 0 2 5 1
2.1 1 5 0.765 1 0 2 4 2
1.2 1 2 0.876 2 0 4 7 1
2.2 1 5 0.765 2 1 4 3 2
Notice that two variables get created, in addition to the three sets getting collapsed: an $id
variable -- which tracks the row number in the original table (dat
), and a $time
variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2
, which here are just the values of $id
and $time
at that row, respectively.
Without knowing exactly what you're trying to track, hard to say whether $id
or $time
is what you want to use as the row identifier, but they're both there.
Might also be useful to play with the parameters timevar
and idvar
(you can set timevar
to NULL
, for example).
reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"),
timevar="id1", idvar="id2")
Reshape data set from wide to long format grouped by variable suffix
Using reshape
we can set the cutpoints with sep=""
.
reshape(d, idvar="ID", varying=2:5, timevar="YEAR", sep="", direction="long")
# ID YEAR MI FRAC
# 1.1995 1 1995 2 3
# 7.1995 7 1995 3 10
# 10.1995 10 1995 1 2
# 1.1996 1 1996 2 4
# 7.1996 7 1996 12 1
# 10.1996 10 1996 1 1
Data
d <- structure(list(ID = c(1L, 7L, 10L), MI_1995 = c(2L, 3L, 1L),
FRAC_1995 = c(3L, 10L, 2L), MI_1996 = c(2L, 12L, 1L),
FRAC_1996 = c(4L, 1L, 1L)), row.names = c(NA, -3L),
class = "data.frame")
Using Reshape from wide to long in R
Here are three examples (along with some sample data that I think is representative of what you described).
Here's the sample data:
set.seed(1)
mydf <- data.frame(
company = LETTERS[1:4],
earnings_2012 = runif(4),
earnings_2011 = runif(4),
earnings_2010 = runif(4),
assets_2012 = runif(4),
assets_2011 = runif(4),
assets_2010 = runif(4)
)
mydf
# company earnings_2012 earnings_2011 earnings_2010 assets_2012 assets_2011 assets_2010
# 1 A 0.2655087 0.2016819 0.62911404 0.6870228 0.7176185 0.9347052
# 2 B 0.3721239 0.8983897 0.06178627 0.3841037 0.9919061 0.2121425
# 3 C 0.5728534 0.9446753 0.20597457 0.7698414 0.3800352 0.6516738
# 4 D 0.9082078 0.6607978 0.17655675 0.4976992 0.7774452 0.1255551
Option 1: reshape
One limitation is that it won't handle "unbalanced" datasets (for example, if you didn't have "assets_2010" as part of your data, this wouldn't work).
reshape(mydf, direction = "long", idvar="company",
varying = 2:ncol(mydf), sep = "_")
# company time earnings assets
# A.2012 A 2012 0.26550866 0.6870228
# B.2012 B 2012 0.37212390 0.3841037
# C.2012 C 2012 0.57285336 0.7698414
# D.2012 D 2012 0.90820779 0.4976992
# A.2011 A 2011 0.20168193 0.7176185
# B.2011 B 2011 0.89838968 0.9919061
# C.2011 C 2011 0.94467527 0.3800352
# D.2011 D 2011 0.66079779 0.7774452
# A.2010 A 2010 0.62911404 0.9347052
# B.2010 B 2010 0.06178627 0.2121425
# C.2010 C 2010 0.20597457 0.6516738
# D.2010 D 2010 0.17655675 0.1255551
Option 2: The "reshape2" package
Quite popular for its syntax. Needs a little bit of processing before it can work since the column names need to be split in order for us to get this "double-wide" type of data. Is able to handle unbalanced data, but won't be the best if your varying columns are of different column types (numeric, character, factor).
library(reshape2)
dfL <- melt(mydf, id.vars="company")
dfL <- cbind(dfL, colsplit(dfL$variable, "_", c("var", "year")))
dcast(dfL, company + year ~ var, value.var="value")
# company year assets earnings
# 1 A 2010 0.9347052 0.62911404
# 2 A 2011 0.7176185 0.20168193
# 3 A 2012 0.6870228 0.26550866
# 4 B 2010 0.2121425 0.06178627
# 5 B 2011 0.9919061 0.89838968
# 6 B 2012 0.3841037 0.37212390
# 7 C 2010 0.6516738 0.20597457
# 8 C 2011 0.3800352 0.94467527
# 9 C 2012 0.7698414 0.57285336
# 10 D 2010 0.1255551 0.17655675
# 11 D 2011 0.7774452 0.66079779
# 12 D 2012 0.4976992 0.90820779
Option 3: merged.stack
from "splitstackshape"
merged.stack
from my "splitstackshape" package has pretty straightforward syntax and should be pretty fast if you need to end up with this "double-wide" type of structure. It was created to be able to handle unbalanced data and since it treats columns separately, won't have problems with converting column types.
library(splitstackshape)
merged.stack(mydf, id.vars="company",
var.stubs=c("earnings", "assets"), sep = "_")
# company .time_1 earnings assets
# 1: A 2010 0.62911404 0.9347052
# 2: A 2011 0.20168193 0.7176185
# 3: A 2012 0.26550866 0.6870228
# 4: B 2010 0.06178627 0.2121425
# 5: B 2011 0.89838968 0.9919061
# 6: B 2012 0.37212390 0.3841037
# 7: C 2010 0.20597457 0.6516738
# 8: C 2011 0.94467527 0.3800352
# 9: C 2012 0.57285336 0.7698414
# 10: D 2010 0.17655675 0.1255551
# 11: D 2011 0.66079779 0.7774452
# 12: D 2012 0.90820779 0.4976992
Related Topics
Merge Multiple Data.Frames in R with Varying Row Length
Add Months of Zero Demand to Zoo Time Series
Splitting Text to Words with R and Csplit()
Using Grepl in R to Search for an Asterisk
Fread and a Quoted Multi-Line Column Value
Variable Results with Dplyr Summarise, Depending on Output Variable Naming
Object Not Found Error with Ggplot2 When Adding Shape Aesthetic
How to Plot a Boxplot with Correctly Spaced Continuous X-Axis Values in Ggplot2
R: How to Retrieve a Column Name of a Data Frame
Arranging Ggally Plots with Gridextra
Selecting Max Column Values in R
Dependent Inputs in Shiny Application with R
Split on Factor, Sapply, and Lm
R: Get Element by Name from a Nested List
Unexpected Date When Converting Posixct Date-Time to Date - Timezone Issue