How to Use the Box-Cox Power Transformation in R

Mutate a column in a tsibble dataframe, applying a Box-Cox transformation

Issue, inside tsibbles, when using dplyr, you do not call chicago_sales$Median_price, but just Median_price. When using tsibbles I would advice using fable and fabletools, but if you are using forecast, it should work like this:

library(tsibble)
library(dplyr)
library(forecast)

pedestrian %>%
mutate(bc = BoxCox(Count, BoxCox.lambda(Count)))
# A tsibble: 66,037 x 6 [1h] <Australia/Melbourne>
# Key: Sensor [4]
Sensor Date_Time Date Time Count bc
<chr> <dttm> <date> <int> <int> <dbl>
1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630 11.3
2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826 9.87
3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567 9.10
4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264 7.65
5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139 6.52
6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77 5.54
7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44 4.67
8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56 5.04
9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113 6.17
10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166 6.82
# ... with 66,027 more rows

I used a built in dataset from the tsibble package as you did not provide a dput of chicago_sales.

Box-Cox transformation with survey data in R

the link you have given appears to be to a user-defined function in SAS that is running within a data step. It should be possible to reprogram the method into R.

If you look at the suggested SAS method here, you'll see it uses proc transreg to estimate the power transformation required. That SAS proc does not accept survey weights. I am not sure what the weight option does in that proc see here

Update: I had a closer look at the first link you gave here. It appears that the weighting is being done in proc univariate with the weight option activated if the data contains weights. However, if you look at the detail for weight from here, you'll see that the weights are used to manipulate the variances. I'm not sure that you want to run with that assumption for your data.



Related Topics



Leave a reply



Submit