Convert Data Frame with Date Column to Timeseries

How to convert dataframe into time series?

R has multiple ways of represeting time series. Since you're working with daily prices of stocks, you may wish to consider that financial markets are closed on weekends and business holidays so that trading days and calendar days are not the same. However, you may need to work with your times series in terms of both trading days and calendar days. For example, daily returns are calculated from sequential daily closing prices regardless of whether a weekend intervenes. But you may also want to do calendar-based reporting such as weekly price summaries. For these reasons the xts package, an extension of zoo, is commonly used with financial data in R. An example of how it could be used with your data follows.

Assuming the data shown in your example is in the dataframe df

  library(xts)
stocks <- xts(df[,-1], order.by=as.Date(df[,1], "%m/%d/%Y"))
#
# daily returns
#
returns <- diff(stocks, arithmetic=FALSE ) - 1
#
# weekly open, high, low, close reports
#
to.weekly(stocks$Hero_close, name="Hero")

which gives the output

           Hero.Open Hero.High Hero.Low Hero.Close
2013-03-15 1669.1 1684.45 1669.1 1684.45
2013-03-22 1690.5 1690.50 1623.3 1659.60
2013-03-28 1617.7 1617.70 1542.0 1542.00

How to convert data frame for time series analysis in Python?

Depending on the task you are trying to solve, i can see two options for this dataset.

  • Either, as you show in your example, count the number of occurrences of the text field in each day, independently of the value of the text field.
  • Or, count the number of occurrence of each unique value of the text field each day. You will then have one column for each possible value of the text field, which may make more sense if the values are purely categorical.

First things to do :

import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done

Date Text
0 2018-01-01 A
1 2018-01-01 B
2 2018-01-01 C
3 2018-01-02 A
4 2018-01-03 A

Then for option one :

df = df.groupby('Date').count()

Text
Date
2018-01-01 3
2018-01-02 1
2018-01-03 1

For option two :

df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()

A B C
Date
2018-01-01 1 1 1
2018-01-02 1 0 0
2018-01-03 1 0 0

The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.

If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.

Convert data frame with date column to timeseries

Your DATE column may represent a date, but it is actually either a character, factor, integer, or a numeric vector.

First, you need to convert the DATE column to a Date object. Then you can create an xts object from the CLOSE and DATE columns of your PRICE data.frame. Finally, you can use the xts object to calculate returns and the Calmar ratio.

PRICE <- structure(list(
DATE = c(20070103L, 20070104L, 20070105L, 20070108L, 20070109L,
20070110L, 20070111L, 20070112L, 20070115L),
CLOSE = c(54.7, 54.77, 55.12, 54.87, 54.86, 54.27, 54.77, 55.36, 55.76)),
.Names = c("DATE", "CLOSE"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))

library(PerformanceAnalytics) # loads/attaches xts
# Convert DATE to Date class
PRICE$DATE <- as.Date(as.character(PRICE$DATE),format="%Y%m%d")
# create xts object
x <- xts(PRICE$CLOSE,PRICE$DATE)
CalmarRatio(Return.calculate(x))
# [,1]
# Calmar Ratio 52.82026

convert data frame to time series in R

1) zoo Probably the easiest is to convert it to "zoo" class and from that to "ts" class. "yearmon" class is a class provided in the zoo package for representing monthly data and closely corresponds to frequency 12 data in ts. The result is a "ts" class series having the same length as the number of rows in df.

library(zoo)
as.ts(read.zoo(df, FUN = as.yearmon))

## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2015 6434 5595 3101 3475 6519 7251 4200 3622 4782 6503 9460 15623
## 2016 18393 14410 11210 10582 14316 11876 13676 12466 17326 15845 15569 24933
## 2017 35050 26008 25767 17858 21089 13570

Depending on what you want to do you may prefer to just leave it as a "zoo" class time series in which case omit the as.ts .

1a) An alternative way to use zoo would be:

ts(df$inflow, start = as.yearmon(df$date[1]), freq = 12)

2) base This is longer but does not use any packages:

mo <- as.numeric(format(df$date[1], "%m"))
yr <- as.numeric(format(df$date[1], "%Y"))
ts(df$inflow, start = c(yr, mo), freq = 12)

If it were known that the series always starts in January then we could omit the definition of mo and write:

ts(df$inflow, start = yr, freq = 12)

Note: The input df from the question is:

df <- 
structure(list(date = structure(c(16436, 16467, 16495, 16526,
16556, 16587, 16617, 16648, 16679, 16709, 16740, 16770, 16801,
16832, 16861, 16892, 16922, 16953, 16983, 17014, 17045, 17075,
17106, 17136, 17167, 17198, 17226, 17257, 17287, 17318), class = "Date"),
inflow = c(6434L, 5595L, 3101L, 3475L, 6519L, 7251L, 4200L,
3622L, 4782L, 6503L, 9460L, 15623L, 18393L, 14410L, 11210L,
10582L, 14316L, 11876L, 13676L, 12466L, 17326L, 15845L, 15569L,
24933L, 35050L, 26008L, 25767L, 17858L, 21089L, 13570L)), row.names =
c(NA, 30L), class = "data.frame", .Names = c("date", "inflow"))


Related Topics



Leave a reply



Submit