Calculating Inter-purchase Time in R
You can use plyr
:
library(plyr)
ddply(df, "id", transform, inter.time = c(0, diff(date2)))
or ave
:
transform(df, inter.time = ave(as.numeric(date2), id,
FUN = function(x)c(0, diff(x))))
Both give
# id date date2 inter.time
# 1 1 23-01-07 2007-01-23 0
# 2 1 27-01-07 2007-01-27 4
# 3 1 30-01-07 2007-01-30 3
# 4 3 11-12-07 2007-12-11 0
# 5 3 12-12-07 2007-12-12 1
# 6 3 01-01-08 2008-01-01 20
P.S.: you might want to replace these zeroes with NA
.
Calculating Interpurchase time with the first transaction subtracted from a reference date
You can use lag
function to compare each date
with the previous one, for each id
, but when there's no previous date you can use a default one (2011-01-01 in your case).
id <- c(1,1,2,2)
date <- c("2011-01-18","2011-01-31","2011-01-02","2011-01-15")
df <- data.frame(id,date)
library(dplyr)
library(lubridate)
df %>%
group_by(id) %>%
mutate(date = ymd(date),
int_time = as.numeric(date - lag(date, default = ymd("2011-01-01")))) %>%
ungroup()
# # A tibble: 4 x 3
# id date int_time
# <dbl> <date> <dbl>
# 1 1 2011-01-18 17
# 2 1 2011-01-31 13
# 3 2 2011-01-02 1
# 4 2 2011-01-15 13
Obtaining average inter-purchase time with all dates in one column in R
In base R, you could use aggregate
together with a custom function:
aggregate(order_date ~ cust_id, data=df, FUN=function(x) mean(diff(x)))
cust_id order_date
1 1 7.5
2 2 5.0
Here, we take the difference by order date and then calculate the mean. Note that this requires that the data are sorted by date. You could make sure this is true by including order
in the call to the data.frame, as in data=df[order(df$order_date),]
for example.
data
Includes a couple of typo corrections from OP.
df <-
structure(list(cust_id = c(1, 2, 1, 2, 1), order_date = structure(c(15566,
15522, 15575, 15527, 15581), class = "Date")), .Names = c("cust_id",
"order_date"), row.names = c(NA, -5L), class = "data.frame")
Calculating the average difference in purchase dates by customer id
I've assumed you have read your CSV into a dataframe named df
and I've renamed your variables using snake case, since having variables with a space in the name can be inconvenient, leading many to use either snake case or camel case variable naming conventions.
Here is a base R solution:
mean(sapply(by(df$purchase_date, df$customer_id, diff), mean), na.rm=TRUE)
[1] 60.75
You may notice that we get 60.75
rather than 60 as you expected. This is because there are 31 days between customer 1's purchases (31 days in January until February 1), and similarly for customer 2's purchases -- there are not always 30 days in a month.
Explanation
by(df$purchase_date, df$customer_id, diff)
The by()
function applies another function to data by groupings. Here, we are applying diff()
to df$purchase_date
by the unique values of df$customer_id
. By itself, this would result in the following output:
df$customer_id: 1
Time difference of 31 days
-----------------------------------------------------------
df$customer_id: 2
Time differences in days
[1] 59 122
We then use
sapply(by(df$purchase_date, df$customer_id, diff), mean)
to apply mean()
to the elements of the previous result. This gives us each customer's average time to repurchase:
1 2 3 4
31.0 90.5 NaN NaN
(we see customers 3 and 4 never repurchased). Finally, we need to average these average repurchase times, which means we need to also deal with those NaN
values, so we use:
mean(sapply(by(df$purchase_date, df$customer_id, diff), mean), na.rm=TRUE)
which will average the previous results, ignoring missing values (which, in R include NaN values).
Getting latest date with count of customers in R
Does this work:
library(dplyr)
df %>% group_by(Showroom, Item) %>% summarise(Total_Customers = n(), Quantity = mean(Quantity)) %>%
left_join(df %>% group_by(Showroom, Item) %>% filter(Date_X == max(Date_X)), by = c('Showroom', 'Item')) %>%
select(Showroom, Item, Total_Customers, 'Last_Purchase_Date' = Date_X, 'Quantity' = Quantity.x)
`summarise()` regrouping output by 'Showroom' (override with `.groups` argument)
# A tibble: 3 x 5
# Groups: Showroom [1]
Showroom Item Total_Customers Last_Purchase_Date Quantity
<chr> <chr> <int> <chr> <dbl>
1 A z1 1 2020-01-01 12
2 A z2 3 2020-05-01 18.7
3 A z3 2 2020-06-01 134
Calculating time slept
1 - Transform character vectors to a date-time object
bed <- lubridate::parse_date_time(bed, '%H%M')
wake <- lubridate::parse_date_time(wake, '%H%M')
2 - Calculate time difference
time_diff <- wake - bed
3 - Correct negative values by adding 24 hours.
time_diff_corrected <- ifelse(time_diff < 0, time_diff + 24, time_diff)
Related Topics
How to Edit Column Names in Datatable Function When Running R Shiny App
How Is Ggplot2 Plus Operator Defined
Error When Mapping in Ggmap with API Key (403 Forbidden)
Function for Polynomials of Arbitrary Order (Symbolic Method Preferred)
How to Check If Multiple Strings Exist in Another String
Stargazer Output Appears Below Text - Rmarkdown to PDF
Take the Subsets of a Data.Frame with the Same Feature and Select a Single Row from Each Subset
Get Rows of Unique Values by Group
Finding Number of Elements in One Vector That Are Less Than an Element in Another Vector
Reshape Data from Wide to Long
Handling Missing Combinations of Factors in R
R: How to Get a Sum of Two Distributions
How to Set Ggplot X-Label Equal to Variable Name During Lapply
Grouping Factor Levels in a Data.Table
Calculating Inter-Purchase Time in R
Ggplot2: Adding Lines in a Loop and Retaining Colour Mappings