Transforming a Time-Series into a Data Frame and Back

Transforming a time-series into a data frame and back

Here are two ways. The first way creates dimnames for the matrix about to be created and then strings out the data into a matrix, transposes it and converts it to data frame. The second way creates a by list consisting of year and month variables and uses tapply on that later converting to data frame and adding names.

# create test data
set.seed(123)
tt <- ts(rnorm(12*5, 17, 8), start=c(1981,1), frequency = 12)

1) matrix. This solution requires that we have whole consecutive years

dmn <- list(month.abb, unique(floor(time(tt))))
as.data.frame(t(matrix(tt, 12, dimnames = dmn)))

If we don't care about the nice names it is just as.data.frame(t(matrix(tt, 12))) .

We could replace the dmn<- line with the following simpler line using @thelatemail's comment:

dmn <- dimnames(.preformat.ts(tt))

2) tapply. A more general solution using tapply is the following:

Month <-  factor(cycle(tt), levels = 1:12, labels = month.abb)
tapply(tt, list(year = floor(time(tt)), month = Month), c)

Note: To invert this suppose X is any of the solutions above. Then try:

ts(c(t(X)), start = 1981, freq = 12)

Update

Improvement motivated by comments of @latemail below.

Converting time series to data frame in R

Are you talking about an xts time series? The other answers are overthinking this to the nth degree while overlooking the real issue - data.frame is not a conversion function.

You want as.data.frame()- xts defines an as.data.frame.xts S3 method.

How to convert a multi variate time series object to a data frame?

You can try the following:

library(xts)

m <- decompose(co2)
str(m)
#> List of 6
#>  $ x       : Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
#>  $ seasonal: Time-Series [1:468] from 1959 to 1998: -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#>  $ trend   : Time-Series [1:468] from 1959 to 1998: NA NA NA NA NA ...
#>  $ random  : Time-Series [1:468] from 1959 to 1998: NA NA NA NA NA ...
#>  $ figure  : num [1:12] -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#>  $ type    : chr "additive"
#>  - attr(*, "class")= chr "decomposed.ts"

df <- as.data.frame(m[c("x", "seasonal", "trend", "random")])
str(df)
#> 'data.frame':    468 obs. of  4 variables:
#>  $ x       : Time-Series  from 1959 to 1998: 315 316 316 318 318 ...
#>  $ seasonal: Time-Series  from 1959 to 1998: -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#>  $ trend   : Time-Series  from 1959 to 1998: NA NA NA NA NA ...
#>  $ random  : Time-Series  from 1959 to 1998: NA NA NA NA NA ...

df2 <- data.frame(date = index(m$x), 
                  apply(df, 2, as.numeric))
str(df2)
#> 'data.frame':    468 obs. of  5 variables:
#>  $ date    : num  1959 1959 1959 1959 1959 ...
#>  $ x       : num  315 316 316 318 318 ...
#>  $ seasonal: num  -0.0536 0.6106 1.3756 2.5168 3.0003 ...
#>  $ trend   : num  NA NA NA NA NA ...
#>  $ random  : num  NA NA NA NA NA ...

^{Created on 2020-03-13 by the reprex package (v0.3.0)}

You can also try tsibble and feasts

library(xts)
library(tsibble)
library(feasts)

m <- decompose(co2)

as_tsibble(co2) %>% 
  model(decomp = classical_decomposition(value, type = "additive")) %>%
  components() 
#> # A dable:                 468 x 7 [1M]
#> # Key:                     .model [1]
#> # Classical Decomposition: value = trend + seasonal + random
#>    .model    index value trend seasonal  random season_adjust
#>    <chr>     <mth> <dbl> <dbl>    <dbl>   <dbl>         <dbl>
#>  1 decomp 1959 Jan  315.   NA   -0.0536 NA               315.
#>  2 decomp 1959 Feb  316.   NA    0.611  NA               316.
#>  3 decomp 1959 Mär  316.   NA    1.38   NA               315.
#>  4 decomp 1959 Apr  318.   NA    2.52   NA               315.
#>  5 decomp 1959 Mai  318.   NA    3.00   NA               315.
#>  6 decomp 1959 Jun  318    NA    2.33   NA               316.
#>  7 decomp 1959 Jul  316.  316.   0.813  -0.284           316.
#>  8 decomp 1959 Aug  315.  316.  -1.25   -0.0170          316.
#>  9 decomp 1959 Sep  314.  316.  -3.05    0.758           317.
#> 10 decomp 1959 Okt  313.  316.  -3.25    0.362           316.
#> # … with 458 more rows

^{Created on 2020-03-13 by the reprex package (v0.3.0)}

How to convert time series dates into data frame dates

ts series do not understand Date class but you can encode the dates into numbers and then decode them back. Assuming that you want a series with frequency 52 the first week in 2016 will be represented by 2016, the second by 2016+1/52, ..., the last by 2016+51/52.

For example,

tt <- ts(rnorm(305), start = 2016, freq = 52)

Now decode the dates.

toDate <- function(tt) {
  yr <- as.integer(time(tt))
  week <- as.integer(cycle(tt))  # first week of year is 1, etc.
  as.Date(ISOdate(yr, 1, 1)) + 7 * (week - 1)
}

data.frame(dates = toDate(tt), series = c(tt))

We can also convert from Date class to year/week number

# input is a Date class object
to_yw <- function(date) {
  yr <- as.numeric(format(date, "%Y"))
  yday <- as.POSIXlt(date)$yday  # jan 1st is 0
  week <- pmin(floor(yday / 7), 51) + 1  # 1st week of yr is 1
  yw <- yr + (week - 1) / 52
  list(yw = yw, year = yr, yday = yday, week = week)
}

Converting time series to data frame, matrix, or table

You do not need time series, just tapply:

res=tapply(AVG_LOSCAT2$AVG_LOSCAT, list(year = AVG_LOSCAT2$YEAR, month = AVG_LOSCAT2$MONTH), round,2)
res

      month
year      1    2    3    4    5    6    7    8    9   10   11   12
  2012   NA   NA   NA   NA   NA 7.51 7.31 8.33 7.66 5.36 6.46 8.30
  2013 5.74 7.89 6.49 7.09 5.91 6.31 8.24 6.73 8.56 8.19 6.54 6.49
  2014 8.03 6.80 6.25 7.10 5.38 6.21 7.78 8.87 6.62 6.09 8.40 8.37
  2015 8.00 5.73 6.32 6.71 6.32 6.75   NA   NA   NA   NA   NA   NA

Time series to data frame

The date you get is actually a decimal year, so the month is in there still. If you want to have two columns, year and month, you can try something like:

res <- data.frame(as.matrix(Seatbelts), date=time(Seatbelts))
res$year <- trunc(res$date)
res$month <- (res$date - res$year) * 12 + 1
res

##  DriversKilled drivers front rear   kms PetrolPrice VanKilled law     date year month
## 1           107    1687   867  269  9059   0.1029718        12   0 1969.000 1969     1
## 2            97    1508   825  265  7685   0.1023630         6   0 1969.083 1969     2
## 3           102    1507   806  319  9963   0.1020625        12   0 1969.167 1969     3
## 4            87    1385   814  407 10955   0.1008733         8   0 1969.250 1969     4
## 5           119    1632   991  454 11823   0.1010197        10   0 1969.333 1969     5
## 6           106    1511   945  427 12391   0.1005812        13   0 1969.417 1969     6

How to convert data frame for time series analysis in Python?

Depending on the task you are trying to solve, i can see two options for this dataset.

Either, as you show in your example, count the number of occurrences of the text field in each day, independently of the value of the text field.
Or, count the number of occurrence of each unique value of the text field each day. You will then have one column for each possible value of the text field, which may make more sense if the values are purely categorical.

First things to do :

import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done

         Date Text
0  2018-01-01    A
1  2018-01-01    B
2  2018-01-01    C
3  2018-01-02    A
4  2018-01-03    A

Then for option one :

df = df.groupby('Date').count()

            Text
Date            
2018-01-01     3
2018-01-02     1
2018-01-03     1

For option two :

df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()

            A  B  C
Date               
2018-01-01  1  1  1
2018-01-02  1  0  0
2018-01-03  1  0  0

The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.

If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.

Converting ts object to data.frame

How about

data.frame(Y=as.matrix(dat), date=time(dat))

This returns

          Y    date
1  86.04519 1959.25
2  93.78866 1959.50
3  88.04912 1959.75
4  94.30623 1960.00
5  72.82405 1960.25
6  58.31859 1960.50
7  66.25477 1960.75
8  75.46122 1961.00
9  86.38526 1961.25
10 99.48685 1961.50

Transformation of time series data arranged as value per column into Pandas 2D Dataframe

I'm not sure I'm fully understanding the rebasing logic, but this does lead to from your sample input to your sample output.

# Convert to pandas datetime.
df.acq_stamp = pd.to_datetime(df.acq_stamp)

# Melt your dataframe, keeping the acq_stamp column.
df = df.melt('acq_stamp', var_name='rawdata')

# Get the numerical value out of the rawdata column name.
df.rawdata = df.rawdata.str.extract('(\d+)').astype(int)

# Find the absolute difference between each rawdata point and the max rawdata point,
# Make this into units of 100ns and subtract it from the acq_stamp.
rawdiff_as_ns = df.rawdata.sub(df.rawdata.max()).abs().mul(100)
df['timestamp'] = df.acq_stamp.sub(pd.to_timedelta(rawdiff_as_ns, unit='ns'))

# Sort your data.
df = df.sort_values('timestamp', ignore_index=True)

# Outputting just the columns you wanted:
print(df[['timestamp', 'acq_stamp', 'value']])

Output:

                      timestamp                  acq_stamp  value
0 2022-05-15 21:00:02.660159700 2022-05-15 21:00:02.660160      1
1 2022-05-15 21:00:02.660159800 2022-05-15 21:00:02.660160      2
2 2022-05-15 21:00:02.660159900 2022-05-15 21:00:02.660160      3
3 2022-05-15 21:00:02.660160000 2022-05-15 21:00:02.660160      4
4 2022-05-15 21:00:04.660159700 2022-05-15 21:00:04.660160      5
5 2022-05-15 21:00:04.660159800 2022-05-15 21:00:04.660160      6
6 2022-05-15 21:00:04.660159900 2022-05-15 21:00:04.660160      7
7 2022-05-15 21:00:04.660160000 2022-05-15 21:00:04.660160      8

Transforming a Time-Series into a Data Frame and Back