Transforming a Time-Series into a Data Frame and Back

Transforming a time-series into a data frame and back

Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.

# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)

1) matrix. This solution requires that we have whole consecutive years

dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))

If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .

We could replace the dmn<- line with the following simpler line using @thelatemail's comment:

dmn <- dimnames(.preformat.ts(tt))

2) tapply. A more general solution using tapply is the following:

Month <-  factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)

Note: To invert this suppose X is any of the solutions above. Then try:

ts(c(t(X)), start = 1981, freq = 12)

Update

Improvement motivated by comments of @latemail below.

Converting time series to data frame in R

Are you talking about an xts time series? The other answers are overthinking this to the nth degree while overlooking the real issue - data.frame is not a conversion function.

You want as.data.frame()- xts defines an as.data.frame.xts S3 method.

How to convert a multi variate time series object to a data frame?

You can try the following:

library(xts)

m <- decompose(co2)
str(m)
#> List of 6
#> $ x : Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
#> $ seasonal: Time-Series [1:468] from 1959 to 1998: -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#> $ trend : Time-Series [1:468] from 1959 to 1998: NA NA NA NA NA ...
#> $ random : Time-Series [1:468] from 1959 to 1998: NA NA NA NA NA ...
#> $ figure : num [1:12] -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#> $ type : chr "additive"
#> - attr(*, "class")= chr "decomposed.ts"

df <- as.data.frame(m[c("x", "seasonal", "trend", "random")])
str(df)
#> 'data.frame': 468 obs. of 4 variables:
#> $ x : Time-Series from 1959 to 1998: 315 316 316 318 318 ...
#> $ seasonal: Time-Series from 1959 to 1998: -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#> $ trend : Time-Series from 1959 to 1998: NA NA NA NA NA ...
#> $ random : Time-Series from 1959 to 1998: NA NA NA NA NA ...

df2 <- data.frame(date = index(m$x),
apply(df, 2, as.numeric))
str(df2)
#> 'data.frame': 468 obs. of 5 variables:
#> $ date : num 1959 1959 1959 1959 1959 ...
#> $ x : num 315 316 316 318 318 ...
#> $ seasonal: num -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#> $ trend : num NA NA NA NA NA ...
#> $ random : num NA NA NA NA NA ...

Created on 2020-03-13 by the reprex package (v0.3.0)

You can also try tsibble and feasts

library(xts)
library(tsibble)
library(feasts)

m <- decompose(co2)

as_tsibble(co2) %>%
model(decomp = classical_decomposition(value, type = "additive")) %>%
components()
#> # A dable: 468 x 7 [1M]
#> # Key: .model [1]
#> # Classical Decomposition: value = trend + seasonal + random
#> .model index value trend seasonal random season_adjust
#> <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 decomp 1959 Jan 315. NA -0.0536 NA 315.
#> 2 decomp 1959 Feb 316. NA 0.611 NA 316.
#> 3 decomp 1959 Mär 316. NA 1.38 NA 315.
#> 4 decomp 1959 Apr 318. NA 2.52 NA 315.
#> 5 decomp 1959 Mai 318. NA 3.00 NA 315.
#> 6 decomp 1959 Jun 318 NA 2.33 NA 316.
#> 7 decomp 1959 Jul 316. 316. 0.813 -0.284 316.
#> 8 decomp 1959 Aug 315. 316. -1.25 -0.0170 316.
#> 9 decomp 1959 Sep 314. 316. -3.05 0.758 317.
#> 10 decomp 1959 Okt 313. 316. -3.25 0.362 316.
#> # … with 458 more rows

Created on 2020-03-13 by the reprex package (v0.3.0)

How to convert time series dates into data frame dates

ts series do not understand Date class but you can encode the dates into numbers and then decode them back. Assuming that you want a series with frequency 52 the first week in 2016 will be represented by 2016, the second by 2016+1/52, ..., the last by 2016+51/52.

For example,

tt <- ts(rnorm(305), start = 2016, freq = 52)

Now decode the dates.

toDate <- function(tt) {
yr <- as.integer(time(tt))
week <- as.integer(cycle(tt)) # first week of year is 1, etc.
as.Date(ISOdate(yr, 1, 1)) + 7 * (week - 1)
}

data.frame(dates = toDate(tt), series = c(tt))

We can also convert from Date class to year/week number

# input is a Date class object
to_yw <- function(date) {
yr <- as.numeric(format(date, "%Y"))
yday <- as.POSIXlt(date)$yday # jan 1st is 0
week <- pmin(floor(yday / 7), 51) + 1 # 1st week of yr is 1
yw <- yr + (week - 1) / 52
list(yw = yw, year = yr, yday = yday, week = week)
}

Converting time series to data frame, matrix, or table

You do not need time series, just tapply:

res=tapply(AVG_LOSCAT2$AVG_LOSCAT, list(year = AVG_LOSCAT2$YEAR, month = AVG_LOSCAT2$MONTH), round,2)
res
      month
year 1 2 3 4 5 6 7 8 9 10 11 12
2012 NA NA NA NA NA 7.51 7.31 8.33 7.66 5.36 6.46 8.30
2013 5.74 7.89 6.49 7.09 5.91 6.31 8.24 6.73 8.56 8.19 6.54 6.49
2014 8.03 6.80 6.25 7.10 5.38 6.21 7.78 8.87 6.62 6.09 8.40 8.37
2015 8.00 5.73 6.32 6.71 6.32 6.75 NA NA NA NA NA NA

Time series to data frame

The date you get is actually a decimal year, so the month is in there still. If you want to have two columns, year and month, you can try something like:

res <- data.frame(as.matrix(Seatbelts), date=time(Seatbelts))
res$year <- trunc(res$date)
res$month <- (res$date - res$year) * 12 + 1
res

## DriversKilled drivers front rear kms PetrolPrice VanKilled law date year month
## 1 107 1687 867 269 9059 0.1029718 12 0 1969.000 1969 1
## 2 97 1508 825 265 7685 0.1023630 6 0 1969.083 1969 2
## 3 102 1507 806 319 9963 0.1020625 12 0 1969.167 1969 3
## 4 87 1385 814 407 10955 0.1008733 8 0 1969.250 1969 4
## 5 119 1632 991 454 11823 0.1010197 10 0 1969.333 1969 5
## 6 106 1511 945 427 12391 0.1005812 13 0 1969.417 1969 6

How to convert data frame for time series analysis in Python?

Depending on the task you are trying to solve, i can see two options for this dataset.

  • Either, as you show in your example, count the number of occurrences of the text field in each day, independently of the value of the text field.
  • Or, count the number of occurrence of each unique value of the text field each day. You will then have one column for each possible value of the text field, which may make more sense if the values are purely categorical.

First things to do :

import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done

Date Text
0 2018-01-01 A
1 2018-01-01 B
2 2018-01-01 C
3 2018-01-02 A
4 2018-01-03 A

Then for option one :

df = df.groupby('Date').count()

Text
Date
2018-01-01 3
2018-01-02 1
2018-01-03 1

For option two :

df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()

A B C
Date
2018-01-01 1 1 1
2018-01-02 1 0 0
2018-01-03 1 0 0

The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.

If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.

Converting ts object to data.frame

How about

data.frame(Y=as.matrix(dat), date=time(dat))

This returns

          Y    date
1 86.04519 1959.25
2 93.78866 1959.50
3 88.04912 1959.75
4 94.30623 1960.00
5 72.82405 1960.25
6 58.31859 1960.50
7 66.25477 1960.75
8 75.46122 1961.00
9 86.38526 1961.25
10 99.48685 1961.50

Transformation of time series data arranged as value per column into Pandas 2D Dataframe

I'm not sure I'm fully understanding the rebasing logic, but this does lead to from your sample input to your sample output.

# Convert to pandas datetime.
df.acq_stamp = pd.to_datetime(df.acq_stamp)

# Melt your dataframe, keeping the acq_stamp column.
df = df.melt('acq_stamp', var_name='rawdata')

# Get the numerical value out of the rawdata column name.
df.rawdata = df.rawdata.str.extract('(\d+)').astype(int)

# Find the absolute difference between each rawdata point and the max rawdata point,
# Make this into units of 100ns and subtract it from the acq_stamp.
rawdiff_as_ns = df.rawdata.sub(df.rawdata.max()).abs().mul(100)
df['timestamp'] = df.acq_stamp.sub(pd.to_timedelta(rawdiff_as_ns, unit='ns'))

# Sort your data.
df = df.sort_values('timestamp', ignore_index=True)

# Outputting just the columns you wanted:
print(df[['timestamp', 'acq_stamp', 'value']])

Output:

                      timestamp                  acq_stamp  value
0 2022-05-15 21:00:02.660159700 2022-05-15 21:00:02.660160 1
1 2022-05-15 21:00:02.660159800 2022-05-15 21:00:02.660160 2
2 2022-05-15 21:00:02.660159900 2022-05-15 21:00:02.660160 3
3 2022-05-15 21:00:02.660160000 2022-05-15 21:00:02.660160 4
4 2022-05-15 21:00:04.660159700 2022-05-15 21:00:04.660160 5
5 2022-05-15 21:00:04.660159800 2022-05-15 21:00:04.660160 6
6 2022-05-15 21:00:04.660159900 2022-05-15 21:00:04.660160 7
7 2022-05-15 21:00:04.660160000 2022-05-15 21:00:04.660160 8


Related Topics



Leave a reply



Submit