Calculating Inter-Purchase Time in R

Calculating Inter-purchase Time in R

You can use plyr:

library(plyr)
ddply(df, "id", transform, inter.time = c(0, diff(date2)))

or ave:

transform(df, inter.time = ave(as.numeric(date2), id,
FUN = function(x)c(0, diff(x))))

Both give

#   id     date      date2 inter.time
# 1 1 23-01-07 2007-01-23 0
# 2 1 27-01-07 2007-01-27 4
# 3 1 30-01-07 2007-01-30 3
# 4 3 11-12-07 2007-12-11 0
# 5 3 12-12-07 2007-12-12 1
# 6 3 01-01-08 2008-01-01 20

P.S.: you might want to replace these zeroes with NA.

Calculating Interpurchase time with the first transaction subtracted from a reference date

You can use lag function to compare each date with the previous one, for each id, but when there's no previous date you can use a default one (2011-01-01 in your case).

id <- c(1,1,2,2)
date <- c("2011-01-18","2011-01-31","2011-01-02","2011-01-15")
df <- data.frame(id,date)

library(dplyr)
library(lubridate)

df %>%
group_by(id) %>%
mutate(date = ymd(date),
int_time = as.numeric(date - lag(date, default = ymd("2011-01-01")))) %>%
ungroup()

# # A tibble: 4 x 3
# id date int_time
# <dbl> <date> <dbl>
# 1 1 2011-01-18 17
# 2 1 2011-01-31 13
# 3 2 2011-01-02 1
# 4 2 2011-01-15 13

Obtaining average inter-purchase time with all dates in one column in R

In base R, you could use aggregate together with a custom function:

aggregate(order_date ~ cust_id, data=df, FUN=function(x) mean(diff(x)))
cust_id order_date
1 1 7.5
2 2 5.0

Here, we take the difference by order date and then calculate the mean. Note that this requires that the data are sorted by date. You could make sure this is true by including order in the call to the data.frame, as in data=df[order(df$order_date),] for example.

data
Includes a couple of typo corrections from OP.

df <- 
structure(list(cust_id = c(1, 2, 1, 2, 1), order_date = structure(c(15566,
15522, 15575, 15527, 15581), class = "Date")), .Names = c("cust_id",
"order_date"), row.names = c(NA, -5L), class = "data.frame")

Calculating the average difference in purchase dates by customer id

I've assumed you have read your CSV into a dataframe named df and I've renamed your variables using snake case, since having variables with a space in the name can be inconvenient, leading many to use either snake case or camel case variable naming conventions.

Here is a base R solution:

mean(sapply(by(df$purchase_date, df$customer_id, diff), mean), na.rm=TRUE)

[1] 60.75

You may notice that we get 60.75 rather than 60 as you expected. This is because there are 31 days between customer 1's purchases (31 days in January until February 1), and similarly for customer 2's purchases -- there are not always 30 days in a month.

Explanation

by(df$purchase_date, df$customer_id, diff)

The by() function applies another function to data by groupings. Here, we are applying diff() to df$purchase_date by the unique values of df$customer_id. By itself, this would result in the following output:

df$customer_id: 1
Time difference of 31 days
-----------------------------------------------------------
df$customer_id: 2
Time differences in days
[1] 59 122

We then use

sapply(by(df$purchase_date, df$customer_id, diff), mean)

to apply mean() to the elements of the previous result. This gives us each customer's average time to repurchase:

   1    2    3    4 
31.0 90.5 NaN NaN

(we see customers 3 and 4 never repurchased). Finally, we need to average these average repurchase times, which means we need to also deal with those NaN values, so we use:

mean(sapply(by(df$purchase_date, df$customer_id, diff), mean), na.rm=TRUE)

which will average the previous results, ignoring missing values (which, in R include NaN values).

Getting latest date with count of customers in R

Does this work:

library(dplyr)
df %>% group_by(Showroom, Item) %>% summarise(Total_Customers = n(), Quantity = mean(Quantity)) %>%
left_join(df %>% group_by(Showroom, Item) %>% filter(Date_X == max(Date_X)), by = c('Showroom', 'Item')) %>%
select(Showroom, Item, Total_Customers, 'Last_Purchase_Date' = Date_X, 'Quantity' = Quantity.x)
`summarise()` regrouping output by 'Showroom' (override with `.groups` argument)
# A tibble: 3 x 5
# Groups: Showroom [1]
Showroom Item Total_Customers Last_Purchase_Date Quantity
<chr> <chr> <int> <chr> <dbl>
1 A z1 1 2020-01-01 12
2 A z2 3 2020-05-01 18.7
3 A z3 2 2020-06-01 134

Calculating time slept

1 - Transform character vectors to a date-time object

bed <- lubridate::parse_date_time(bed, '%H%M')
wake <- lubridate::parse_date_time(wake, '%H%M')

2 - Calculate time difference

time_diff <- wake - bed

3 - Correct negative values by adding 24 hours.

time_diff_corrected <- ifelse(time_diff < 0, time_diff + 24, time_diff)


Related Topics



Leave a reply



Submit