Group data in intervals
If you want the intervals to be calendar based -- i.e. four per hour starting at 0, 15, 30, and 45 minutes, then you can use:
select id, min(begin_date), max(begin_date)
from t
group by id, convert(date, begin_date),
datepart(hour, begin_date), datepart(minute, begin_date) / 15;
Note that begin date and end date have the same value, so I just used begin_date
in this answer.
Grouping data based on time interval
You can groupby first then do a cumsum to get the participant column the way you want. Please make sure the time column is in datetime format and also sort it before you do this.
df['time'] = pd.to_datetime(df['time'])
df['time_diff']=df.groupby(['tablet'])['time'].diff().dt.seconds/60
df['participant'] = np.where((df['time_diff'].isnull()) | (df['time_diff']>10), 1,0).cumsum()
Group by id and store time difference(intervals) into a list
Use summarise
to store the data in a list.
library(dplyr)
d %>%
group_by(ID) %>%
summarise(Time_interval = list(as.numeric(na.omit(round(difftime(Time,
lag(Time), units = 'mins')))))) -> result
result
# A tibble: 2 x 2
# ID Time_interval
# <int> <list>
#1 1 <dbl [3]>
#2 2 <dbl [1]>
result$Time_interval
#[[1]]
#[1] 2 3 80
#[[2]]
#[1] 6
data
d <- structure(list(ID = c(1L, 2L, 1L, 1L, 2L, 1L), Time = structure(c(1581266398,
1582134325, 1581266545, 1581266734, 1582134665, 1581271525), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -6L), class = "data.frame")
How to group data into arrays by intervals
Essentially the startOf
function set to zero the specified date fields.
For grouping by 5 minutes you must get the current minutes and check if they are on the correct interval, to do this you get the current minutes / 5 * 5.
Replace 5 with 15 for 15 minutes.
Obviously this work starting interval from 0:
- 0-4 = 0
- 5-9 = 5
and so on
(result) => moment(result['localcol'], 'DD/MM/YYYY').minutes(Math.floor(moment(result['localcol'], 'DD/MM/YYYY').minutes() / 5) * 5);;
moment(date, 'DD/MM/YYYY').minutes(minutes);
Group by data intervals
WITH t AS (
SELECT ts, (random()*100)::int AS bandwidth
FROM generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
)
SELECT date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,count(*) AS rows_in_timeslice -- optional
,sum(bandwidth) AS sum_bandwidth
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND ts < '2012-09-03 00:00:00+02'::timestamptz -- careful with borders
GROUP BY 1, 2
ORDER BY 1, 2;
The CTE t
provides data like your table might hold: one timestamp ts
per minute with a bandwidth
number. (You don't need that part, you work with your table instead.)
Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:
- date_trunc 5 minute interval in PostgreSQL
Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:
- PostgreSQL: running count of rows for a query 'by minute'
Additional question in comment
WITH -- same as above ...
SELECT DISTINCT ON (1,2)
date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,bandwidth AS bandwith_sample_at_min15
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz
AND ts < '2012-09-03 00:00:00+02'::timestamptz
ORDER BY 1, 2, ts DESC;
Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON
and ORDER BY
.
More information about the used technique here:
- Select first row in each GROUP BY group?
Is there a way to group timestamp data by 30 day intervals starting from the min(date) and add them as columns
If you are using BigQuery, I would recommend:
countif()
to count a boolean valuetimestamp_add()
to add intervals to timestamps
The exact boundaries are a bit vague, but I would go for:
select pc.url,
countif(pv.date >= pc.dt_crtd and
pv.date < timestamp_add(pc.dt_crtd, interval 30 day
) as Interval_00_29,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 30 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 60 day
) as Interval_30_59,
countif(pv.date >= timestamp_add(pc.dt_crtd, interval 60 day) and
pv.date < timestamp_add(pc.dt_crtd, interval 90 day
) as Interval_60_89
from page_creation pc join
page_visits pv
on pc.link = pv.url
group by pc.url
Group data by 15 days interval
You can use LAG
function with offset parameter to find the date of the 2nd previous post, then calculate the date difference:
WITH questions AS (
SELECT OwnerUserId
, CreationDate AS PostDate
, LAG(CreationDate, 2) OVER (PARTITION BY OwnerUserId ORDER BY CreationDate) AS PrevDate
FROM Posts
WHERE OwnerUserId IS NOT NULL -- not community owned
AND PostTypeId = 1 -- questions only
AND CreationDate >= '2018-01-01' -- between 2018
AND CreationDate < '2019-01-01'
AND Tags LIKE '%<sql>%' -- tagged sql
)
SELECT *
FROM questions
WHERE DATEDIFF(DAY, PrevDate, PostDate) <= 14
Related Topics
How to Use a SQL Update Statement to Add 1 Year to a Datetime Column
Possible to Do a Delete with a Having Clause
SQL Server 2008:Cannot Insert New Column in the Middle Position and Change Data Type
How to Add Sequence Number for Groups in a SQL Query Without Temp Tables
How to Get Datetime Value from Timestamp Type Column
SQL Query to Translate a List of Numbers Matched Against Several Ranges, to a List of Values
Oracle Autoincrement with Sequence and Trigger Is Not Working Correctly
How to Detect Query Which Holds the Lock in Postgres
In General, Should Every Table in a Database Have an Identity Field to Use as a Pk
Postgresql Nested Inserts/Withs for Foreign Key Insertions
Postgresql Does Not Use a Partial Index
Order of Ands in Where Clause for Greatest Performance
Select Query with Date Condition
Foreign Key Column Mapped to Multiple Primary Keys
How to Remove Duplicates from Space Separated List by Oracle Regexp_Replace