Calculate Monthly Returns from Daily Returns in Pandas(Cumpound)

Python Compounded Return Group By

Assume we have dataframe:

DATE STOCK RETURN Year Month
1/1/2020 A 0.02 2020 1
2/1/2020 A 0.01 2020 1
3/1/2020 A 0.04 2020 1
1/1/2020 B 0.03 2020 1
2/1/2020 B 0.01 2020 1
3/1/2020 B 0.04 2020 1
1/2/2020 A 0.05 2020 2
2/2/2020 A 0.02 2020 2
3/2/2020 A 0.01 2020 2
4/2/2020 A 0.04 2020 2
1/2/2020 B 0.01 2020 2
2/2/2020 B 0.03 2020 2
3/2/2020 B 0.02 2020 2
4/2/2020 B 0.05 2020 2

To calculate geometric mean of daily returns (check definition) per every month and stock use:

df.groupby(['STOCK','Year','Month'])[['RETURN']].apply(lambda g: ((g+1).cumprod()**(1/len(g))-1).iloc[-1])

result:

                    RETURN
STOCK Year Month
A 2020 1 0.023258
2 0.029879
B 2020 1 0.026591
2 0.027394

You can check manually if calculation is correct, for example for stock A and Jan:

(1.02*1.01*1.04)**(1/3)-1 = 0.023258

Note that iloc[-1] in lambda function select last calculated return per group.

Calculating cumulative returns with pandas dataframe

If performance is important, use numpy.cumprod:

np.cumprod(1 + df['Daily_rets'].values) - 1

Timings:

#7k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [191]: %timeit np.cumprod(1 + df['Daily_rets'].values) - 1
41 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [192]: %timeit (1 + df.Daily_rets).cumprod() - 1
554 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Calculate Daily Returns with Pandas DataFrame

Because operations will do alignment on index, you can convert one of the DataFrames to array:

prices[:-1].values / prices[1:] - 1

or

prices[:-1] / prices[1:].values - 1

depends on what the index of the result you want.

or use shift() method:

prices.shift(1) / prices - 1

and:

prices / prices.shift(1) - 1

Compute a compounded return series in Python

There is a fantastic module called pandas that was written by a guy at AQR (a hedge fund) that excels at calculations like this... what you need is a way to handle "missing data"... as someone mentioned above, the basics are using the nan (not a number) capabilities of scipy or numpy; however, even those libraries don't make financial calculations that much easier... if you use pandas, you can mark the data you don't want to consider as nan, and then any future calculations will reject it, while performing normal operations on other data.

I have been using pandas on my trading platform for about 8 months... I wish I had started using it sooner.

Wes (the author) gave a talk at pyCon 2010 about the capabilities of the module... see the slides and video on the pyCon 2010 webpage. In that video, he demonstrates how to get daily returns, run 1000s of linear regressions on a matrix of returns (in a fraction of a second), timestamp / graph data... all done with this module. Combined with psyco, this is a beast of a financial analysis tool.

The other great thing it handles is cross-sectional data... so you could grab daily close prices, their rolling means, etc... then timestamp every calculation, and get all this stored in something similar to a python dictionary (see the pandas.DataFrame class)... then you access slices of the data as simply as:

close_prices['stdev_5d']

See the pandas rolling moments doc for more information on to calculate the rolling stdev (it's a one-liner).

Wes has gone out of his way to speed the module up with cython, although I'll concede that I'm considering upgrading my server (an older Xeon), due to my analysis requirements.

EDIT FOR STRIMP's QUESTION:
After you converted your code to use pandas data structures, it's still unclear to me how you're indexing your data in a pandas dataframe and the compounding function's requirements for handling missing data (or for that matter the conditions for a 0.0 return... or if you are using NaN in pandas..). I will demonstrate using my data indexing... a day was picked at random... df is a dataframe with ES Futures quotes in it... indexed per second... missing quotes are filled in with numpy.nan. DataFrame indexes are datetime objects, offset by the pytz module's timezone objects.

>>> df.info
<bound method DataFrame.info of <class 'pandas.core.frame.DataFrame'>
Index: 86400 entries , 2011-03-21 00:00:00-04:00 to 2011-03-21 23:59:59-04:00
etf 18390 non-null values
etfvol 18390 non-null values
fut 29446 non-null values
futvol 23446 non-null values
...
>>> # ET is a pytz object...
>>> et
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
>>> # To get the futures quote at 9:45, eastern time...
>>> df.xs(et.localize(dt.datetime(2011,3,21,9,45,0)))['fut']
1291.75
>>>

To give a simple example of how to calculate a column of continuous returns (in a pandas.TimeSeries), which reference the quote 10 minutes ago (and filling in for missing ticks), I would do this:

>>> df['fut'].fill(method='pad')/df['fut'].fill(method='pad').shift(600)

No lambda is required in that case, just dividing the column of values by itself 600 seconds ago. That .shift(600) part is because my data is indexed per-second.

HTH,
\mike

Pandas: Cumulative return function

there is a pandas cumprod() method for that. this will work for every columns.

df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]

this would be about 2 time faster than other solutions on large dataset:

In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop

I would suggest to never use apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.

How do I calculate a compounding return in a vector indexed to 100?

We can take the cumulative product of the returns expressed as coefficients, i.e. 1 + pct_monthly_return

dat$index = 100 * cumprod(1+dat$pct_monthly_return)

Then to add the header row you described:

rbind(data.frame(date = "2020-12-31", index = 100, pct_monthly_return = 0), 
dat)

date index pct_monthly_return
1 2020-12-31 100.0000 0.0000
2 2021-01-31 100.2300 0.0023
3 2021-02-28 101.2323 0.0100
4 2021-03-31 104.7754 0.0350

Sample data

dat <- tibble::tribble(    ~date     ,  ~pct_monthly_return,
"2021-01-31", 0.0023,
"2021-02-28", 0.01,
"2021-03-31", 0.035)


Related Topics



Leave a reply



Submit