Average Difference Between Two Dates, Grouped by a Third Field

Average difference between two dates, grouped by a third field?

You don't specify the granularity you want for the diff. This does it in days:

select username, avg(end_date - start_date) as avg_days
from mytable
group by username

If you want the difference in seconds, use datediff():

select username, avg(datediff(ss, start_date, end_date)) as avg_seconds
...

datediff can measure the diff in any time unit up to years by varying the first parameter, which can be ss, mi, hh, dd, wk, mm or yy.

Average time between dates in same field by groups

I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.

SELECT id_usuarioweb, CASE 
WHEN COUNT(*) < 2
THEN 0
ELSE DATEDIFF(dd,
MIN(
dt_fechaventa
), MAX(
dt_fechaventa
)) / (
COUNT(*) -
1
)
END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb

EDIT: (by @GordonLinoff)

The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.

The average time between them is:

((b - a) + (c - b)) / 2

This simplifies to:

(c - a) / 2

In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.

calculate average difference between dates using pyspark

You can use datediff with window function to calculate the difference, then take an average.

lag is one of the window function and it will take a value from the previous row within the window.

from pyspark.sql import functions as F

# define the window
w = Window.partitionBy('ID').orderBy('Date')

# datediff takes the date difference from the first arg to the second arg (first - second).
(df.withColumn('diff', F.datediff(F.col('Date'), F.lag('Date').over(w)))
.groupby('ID') # aggregate over ID
.agg(F.avg(F.col('diff')).alias('average difference'))
)

How to calculate average time differences by group?

You could create a test number column for each patient and then for each test number calculate the average of Date.diff.

library(dplyr)

data %>%
group_by(PID) %>%
mutate(test_number = row_number()) %>%
group_by(test_number) %>%
summarise(Date.diff = mean(Date.diff)) -> result
result

Average Time Difference Between Two Dates PER Group

Try data.table:

require(lubridate)
require(data.table)
so <- data.frame(visit_dates = c("12/4/2016","12/6/2016","12/7/2016","12/3/2016","12/7/2016","12/10/2016"), person = c("1","1","1","2","2","2"))

so$visit_dates <- mdy(format(as.POSIXct(strptime(so$visit_dates,"%m/%d/%Y",tz = "")),format = "%m/%d/%Y"))
so <- data.table(so, key = c("person", "visit_dates"))
res <- so[, .(avgTimeBetweenVisit = mean(diff(visit_dates))), by = person]
print(res)
# person avgTimeBetweenVisit
# 1: 1 1.5 days
# 2: 2 3.5 days

Average of difference between 2 datetime fields then cast to time

Try to use the date_diff function instead, it will return an integer that you should be able to work with more effectively. You'll likely need to calculate at the delivery label though then average up.

select CarrierName
, AVG(date_diff(DeliveryDate, ShipDate, DAY))
FROM `jan2022floyd.floyd_jan22.leaf_Q4_Jan22`
GROUP BY CarrierName

https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date_diff



Related Topics



Leave a reply



Submit