Reshaping time series data from wide to tall format (for plotting)
you can also use melt() from the 'reshape' library (I think it's easier to use than reshape() itself) - that'll save you the extra step of having to add the time column back in...
> library(reshape)
> m <- melt(dat,id="t",variable_name="symbol")
> names(m) <- sub("value","price",names(m))
> head(m)
t symbol price
1 2009-01-01 X -1.14945096
2 2009-01-02 X -0.07619870
3 2009-01-03 X 0.01547395
4 2009-01-04 X -0.31493143
5 2009-01-05 X 1.26985167
6 2009-01-06 X 1.31492397
> class(m$t)
[1] "Date"
> library(lattice)
> xyplot( price ~ t | symbol, data=m ,type ="l", layout = c(1,3) )
For this particular task, however, I would consider using the 'zoo' library, which would not require you to reshape the data frame:
> library(zoo)
> zobj <- zoo(dat[,-1],dat[,1])
> plot(zobj,col=rainbow(ncol(zobj)))
R developers/contributors (Gabor and Hadley in this case) have blessed us with many great choices. (and can't forget Deepayan for the lattice package)
Convert /reshape a dataset from 'wide to long' format and convert the time column into time format for time-series analysis
I made a few small edits to your sample dataframe based on my comment above:
import pandas as pd
df = pd.DataFrame({'level':['A','B','C','D','E'],
'Time_30':[1993.05,1999.45, 2001.11, 2007.39, 2219.77],
'Time_60':[2123.15,2299.59, 2339.19, 2443.37, 2553.15],
'Time_90':[2323.56,2495.99,2499.13, 2548.71, 2656.0],
'Time_120':[2355.52,2491.19,2519.92,2611.81, 2753.11],
'Time_150':[2425.31,2599.51, 2539.9, 2713.77, 2893.58],
'Time_180':[2443.35,2609.92, 2632.49, 2774.03, 2901.25]} )
First, manipulate the Time_*
column names to be integer values:
timecols = [int(c.replace("Time_","")) for c in df.columns if c != 'level']
df.columns = ['level'] + timecols
After that you can pd.melt()
like you were thinking, yielding a datarame with all those "series" you mentioned above concatenated together:
df1 = df.melt(id_vars=['level'], value_vars=timecols, var_name='time', value_name='count').sort_values(['level','time']).reset_index(drop=True)
print(df1.head(10))
level time count
0 A 30 1993.05
1 A 60 2123.15
2 A 90 2323.56
3 A 120 2355.52
4 A 150 2425.31
5 A 180 2443.35
6 B 30 1999.45
7 B 60 2299.59
8 B 90 2495.99
9 B 120 2491.19
If you want to loop over the level
s, select them with:
for level in df1['level'].unique():
tmp = df1[df1['level']==level]
or
for level in df1['level'].unique():
tmp = df1[df1['level']==level].copy()
...if you intend to modify/add data to the tmp dataframe.
As for making timestamps, you could do:
df1['time'] = pd.to_timedelta(df1['time'], unit='min')
...like you were attempting, but it depends on how you're using it. If you just want strings that look like "00:30:00", etc, you can try something like:
df1['time'] = pd.to_timedelta(df1['time'], unit='min').apply(lambda x:str(x)[-8:])
Anyway, hope that gets you on track for what you need.
Reshaping data.frame from wide to long format
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
reshape(d,
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)
Reshape data table from wide to long with transpose
A better approach is to use the new pivot_londer and pivot_wider functions from the tidyr package.
Easier convention to use and has convenient text manipulation options built in. In this case removing the "X." that was added to column names.
df <- read.table(header=TRUE, text="Mill Acid `1_day` `3_days` `1_week` `2_weeks` `4_weeks` `2_months` `3_months` `6-7_months`
Gävle 0 10.5 12.0 10.9 10.7 10.6 10.1 10 9.81
Gävle 0.5 8.79 10 9.29 9.08 9.39 9.13 9.14 8.86
Gävle 0.75 8.05 8.95 8.33 8.26 8.24 8.22 8.25 7.44
Gävle 1 6.7 7.82 7.77 8.02 8.19 7.79 7.97 6.99
Gävle 1.25 6.52 7.43 7.33 7.11 7.72 7.88 7.91 6.96
Gävle 1.5 6.41 7.25 7.28 6.92 7.63 7.01 7.64 6.7
Obbola 0 10.5 12.0 10.9 10.7 10.6 10.1 10 9.81
Obbola 0.5 8.79 10 9.29 9.08 9.39 9.13 9.14 8.86
Obbola 0.75 8.05 8.95 8.33 8.26 8.24 8.22 8.25 7.44
Obbola 1 6.7 7.82 7.77 8.02 8.19 7.79 7.97 6.99
Obbola 1.25 6.52 7.43 7.33 7.11 7.72 7.88 7.91 6.96
Obbola 1.5 6.41 7.25 7.28 6.92 7.63 7.01 7.64 6.7 ")
library(tidyr)
longdf <- df %>% pivot_longer(-c("Mill", "Acid"), names_to="Time", values_to = "value", names_prefix="X.")
answer <-longdf %>% pivot_wider(id_cols= c("Time", "Acid" ), names_from = "Mill" )
Reshape long form panel data to wide stacked time series
This is a perfect time to try out pivot_table
user_id order_date values
0 11039591 2017-01-01 3277.466
1 11039591 2017-01-02 587.553
2 13629086 2017-01-03 501.882
3 13629086 2017-01-02 1352.546
4 6084613 2017-01-01 441.151
df.pivot_table(index='user_id',columns='order_date',values='values')
Output
order_date 2017-01-01 2017-01-02 2017-01-03
user_id
6084613 441.151 NaN NaN
11039591 3277.466 587.553 NaN
13629086 NaN 1352.546 501.882
How do you convert a wide time series data frame into a long data frame?
you have to specify the id.vars
as second argument in the melt function and it works:
require(reshape)
df <- data.frame(date = as.Date(c("1992-03-23", "1992-03-24")),
item_1 = c(8.63, 7.98),
item_2 = c(7.609, 7.634),
item_3 = c(1.6546, 1.6533))
melt(df, "date")
you will get:
date variable value
1 1992-03-23 item_1 8.6300
2 1992-03-24 item_1 7.9800
3 1992-03-23 item_2 7.6090
4 1992-03-24 item_2 7.6340
5 1992-03-23 item_3 1.6546
6 1992-03-24 item_3 1.6533
hope this helps
Reshaping the HRS data from wide to long format and creating a time variable
An option using data.table
that makes use of the measure/patterns
would be to use melt
. In the example, the column names have common patterns
as 'weight', 'height' which we specify it in measure
parameter to convert it to 'long' format, then extract the numeric part with sub
to create 'Ind'
library(data.table)
melt(setDT(df), measure = patterns("weight", "height"), value.name = c("W", "H"),
variable.name = "time")[,
Ind := as.integer(sub("\\D+", "", data))][order(Ind)][, .(time, Ind, W, H)]
# time Ind W H
# 1: 1 1 56 151
# 2: 2 1 57 154
# 3: 3 1 56 NA
# 4: 4 1 56 153
# 5: 5 1 55 152
# 6: 1 2 76 163
# 7: 2 2 75 164
# 8: 3 2 76 165
# 9: 4 2 73 162
#10: 5 2 77 161
#11: 1 3 87 173
#12: 2 3 88 NA
#13: 3 3 87 NA
#14: 4 3 85 172
#15: 5 3 84 171
#16: 1 4 64 153
#17: 2 4 66 154
#18: 3 4 65 152
#19: 4 4 63 154
#20: 5 4 65 154
Difficulty with Long to Wide to Long Format in R
An option is to convert to matrix
and then melt
with the na.rm = TRUE
option
library(reshape2)
melt(as.matrix(r.data_lower), na.rm = TRUE)
Reshaping data in R (wide - Long)
We can extract the patterns from the column names using sub
, split
the sequence of that vector with 'nm1', use that as measure
in melt
to convert from 'wide' to 'long' format.
library(data.table)
nm1 <- sub("\\d+[[:alpha:]]+$", '', names(df1)[-(1:2)])
lst <- split(seq_along(nm1)+2, nm1)
melt(setDT(df1), measure = lst,
value.name= names(lst), variable.name= 'time')[order(ID)]
# ID Group time M1a M1r M2a M2r
#1: 1 1 1 0.2 200 0.3 300
#2: 1 1 2 0.4 400 0.6 600
#3: 2 1 1 0.3 300 0.4 400
#4: 2 1 2 0.6 600 0.6 600
data
df1 <- structure(list(ID = 1:2, Group = c(1L, 1L),
M1a2hB = c(0.2, 0.3
), M1a3hB = c(0.4, 0.6), M2a2hB = c(0.3, 0.4),
M2a3hB = c(0.6,
0.6), M1r2hB = c(200L, 300L), M1r3hB = c(400L, 600L),
M2r2hB = c(300L,
400L), M2r3hB = c(600L, 600L)), .Names = c("ID", "Group", "M1a2hB",
"M1a3hB", "M2a2hB", "M2a3hB", "M1r2hB", "M1r3hB",
"M2r2hB", "M2r3hB"
), class = "data.frame", row.names = c(NA, -2L))
Related Topics
Long/Bigint/Decimal Equivalent Datatype in R
What Is the Purpose of Setting a Key in Data.Table
Read Multiple CSV Files into Separate Data Frames
Using Data.Table Package Inside My Own Package
Pass Arguments to Dplyr Functions
How to Use Facets With a Dual Y-Axis Ggplot
Why Is the Parallel Package Slower Than Just Using Apply
Chopping a String into a Vector of Fixed Width Character Elements
Summarizing by Subgroup Percentage in R
Calculate Cumulative Sum (Cumsum) by Group
Getting Warning: " 'Newdata' Had 1 Row But Variables Found Have 32 Rows" on Predict.Lm
Assign Multiple Columns Using := in Data.Table, by Group
Capitalize the First Letter of Both Words in a Two Word String
What Are the "Standard Unambiguous Date" Formats For String-To-Date Conversion in R
How to Assign a Unique Id Number to Each Group of Identical Values in a Column
Incomplete Final Line' Warning When Trying to Read a .Csv File into R