Average difference between two dates, grouped by a third field?
You don't specify the granularity you want for the diff. This does it in days:
select username, avg(end_date - start_date) as avg_days
from mytable
group by username
If you want the difference in seconds, use datediff()
:
select username, avg(datediff(ss, start_date, end_date)) as avg_seconds
...
datediff can measure the diff in any time unit up to years by varying the first parameter, which can be ss, mi, hh, dd, wk, mm or yy.
Average time between dates in same field by groups
I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.
SELECT id_usuarioweb, CASE
WHEN COUNT(*) < 2
THEN 0
ELSE DATEDIFF(dd,
MIN(
dt_fechaventa
), MAX(
dt_fechaventa
)) / (
COUNT(*) -
1
)
END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb
EDIT: (by @GordonLinoff)
The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.
The average time between them is:
((b - a) + (c - b)) / 2
This simplifies to:
(c - a) / 2
In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.
calculate average difference between dates using pyspark
You can use datediff
with window function to calculate the difference, then take an average.
lag
is one of the window function and it will take a value from the previous row within the window.
from pyspark.sql import functions as F
# define the window
w = Window.partitionBy('ID').orderBy('Date')
# datediff takes the date difference from the first arg to the second arg (first - second).
(df.withColumn('diff', F.datediff(F.col('Date'), F.lag('Date').over(w)))
.groupby('ID') # aggregate over ID
.agg(F.avg(F.col('diff')).alias('average difference'))
)
How to calculate average time differences by group?
You could create a test number column for each patient and then for each test number calculate the average of Date.diff
.
library(dplyr)
data %>%
group_by(PID) %>%
mutate(test_number = row_number()) %>%
group_by(test_number) %>%
summarise(Date.diff = mean(Date.diff)) -> result
result
Average Time Difference Between Two Dates PER Group
Try data.table:
require(lubridate)
require(data.table)
so <- data.frame(visit_dates = c("12/4/2016","12/6/2016","12/7/2016","12/3/2016","12/7/2016","12/10/2016"), person = c("1","1","1","2","2","2"))
so$visit_dates <- mdy(format(as.POSIXct(strptime(so$visit_dates,"%m/%d/%Y",tz = "")),format = "%m/%d/%Y"))
so <- data.table(so, key = c("person", "visit_dates"))
res <- so[, .(avgTimeBetweenVisit = mean(diff(visit_dates))), by = person]
print(res)
# person avgTimeBetweenVisit
# 1: 1 1.5 days
# 2: 2 3.5 days
Average of difference between 2 datetime fields then cast to time
Try to use the date_diff function instead, it will return an integer that you should be able to work with more effectively. You'll likely need to calculate at the delivery label though then average up.
select CarrierName
, AVG(date_diff(DeliveryDate, ShipDate, DAY))
FROM `jan2022floyd.floyd_jan22.leaf_Q4_Jan22`
GROUP BY CarrierName
https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#date_diff
Related Topics
Nhibernate Count Distinct (Based on Multiple Columns)
Run Multiple Commands in SQLite Manager
Conditional Order by Depending on Column Values
Mysql: Union of a Left Join with a Right Join
SQL Where in (...) Sort by Order of the List
Using Parameters in SQL Query with Sub-Query
Eliminate Duplicates Using Oracle Listagg Function
SQL Select Where Column Begins with \
Count Number of Not Null Fields Within a Row
Escaping Command Parameters Passed to Xp_Cmdshell to Dtexec
SQL Trigger Cannot Do Instead of Delete But Is Required for Ntext, Image Columns
Differencebetween Cross Join and Multiple Tables in One From
How to Write Update Query to Update Two Tables with SQL Data Source