Efficiently Generate a Random Sample of Times and Dates Between Two Dates

efficiently generate a random sample of times and dates between two dates

Ahh, another date/time problem we can reduce to working in floats :)

Try this function

R> latemail <- function(N, st="2012/01/01", et="2012/12/31") {
+ st <- as.POSIXct(as.Date(st))
+ et <- as.POSIXct(as.Date(et))
+ dt <- as.numeric(difftime(et,st,unit="sec"))
+ ev <- sort(runif(N, 0, dt))
+ rt <- st + ev
+ }
R>

We compute the difftime in seconds, and then "merely" draw uniforms over it, sorting the result. Add that to the start and you're done:

R> set.seed(42); print(latemail(5))     ## round to date, or hour, or ...
[1] "2012-04-14 05:34:56.369022 CDT" "2012-08-22 00:41:26.683809 CDT"
[3] "2012-10-29 21:43:16.335659 CDT" "2012-11-29 15:42:03.387701 CST"
[5] "2012-12-07 18:46:50.233761 CST"
R> system.time(latemail(100000))
user system elapsed
0.024 0.000 0.021
R> system.time(latemail(200000))
user system elapsed
0.044 0.000 0.045
R> system.time(latemail(10000000)) ## a few more than in your example :)
user system elapsed
3.240 0.172 3.428
R>

Sample time between time range 08:00:00 and 15:00:00

Create sequence of times within the specific duration and sample. The time would have todays date, to get only time component we use format.

all_times <- format(seq(as.POSIXct('08:00:00', format = "%T"), 
as.POSIXct('15:00:00', format = "%T"), by = "sec"), "%T")

sample(all_times, 3)
#[1] "11:51:16" "09:50:10" "13:09:21"

Generate dates between two dates in a dataframe

Assuming that the 'min_date/max_date' columns are Date class, we use Map to get the sequence of each 'min_date' with the corresponding 'max_date' in a list, replicate the sequence of rows of 'df1' with the number of rows of the list elements, create a data.frame by expanding the dataset based on 'i1' and get create 'dates' by concatenating the 'lst' elements.

lst <- Map(function(x, y) seq(x,y, by = "1 day"), df1$min_date, df1$max_date)
i1 <- rep(1:nrow(df1), lengths(lst))
data.frame(df1[i1,-3], dates = do.call("c", lst))

Or if we are using dplyr

library(dplyr)
df1 %>%
rowwise() %>%
do(data.frame(.[1:2], date = seq(.$min_date, .$max_date, by = "1 day")))

Or using data.table, we can do this in a single line of code

library(data.table) 
setDT(df1)[,.(date = seq(min_date, max_date, by = "1 day")) ,.(id1, id2)]

What is a good way to select random dates over a given interval using R?

I would use sample.int:

Start <- as.Date("2013-01-01")
End <- as.Date("2013-01-31")
Samp <- Start + sample.int(End-Start, 5)

How to generate a random date and time between two dates?

Time.at((date2.to_f - date1.to_f)*rand + date1.to_f)

You'll get a time object that is between two given datetimes.

Generate random list of timestamps within multiple time intervals in python

Here is a way to do it: the idea is that if we remove the total duration of the periods from the time available, generate start times in the period that is left, and then postpone them with the cumulated periods before them, we are sure that the intervals won't overlap.

from datetime import datetime, timedelta
import random


def generate_periods(start, end, durations):
durations = [timedelta(minutes=m) for m in durations]

total_duration = sum(durations, timedelta())
nb_periods = len(durations)
open_duration = (end - start) - total_duration

delays = sorted(timedelta(seconds=s)
for s in random.sample(range(0, int(open_duration.total_seconds())), nb_periods))
periods = []
periods_before = timedelta()
for delay, duration in zip(delays, durations):
periods.append((start + delay + periods_before,
start + delay + periods_before + duration))
periods_before += duration

return periods

Sample run:

durations = [32, 24, 4, 20, 40, 8, 27, 18, 3, 4] 
start_time = datetime(2019, 9, 2, 9, 0, 0)
end_time = datetime(2019, 9, 2, 17, 0, 0)
generate_periods(start_time, end_time, durations)

# [(datetime.datetime(2019, 9, 2, 9, 16, 1),
# datetime.datetime(2019, 9, 2, 9, 48, 1)),
# (datetime.datetime(2019, 9, 2, 9, 58, 57),
# datetime.datetime(2019, 9, 2, 10, 22, 57)),
# (datetime.datetime(2019, 9, 2, 10, 56, 41),
# datetime.datetime(2019, 9, 2, 11, 0, 41)),
# (datetime.datetime(2019, 9, 2, 11, 2, 37),
# datetime.datetime(2019, 9, 2, 11, 22, 37)),
# (datetime.datetime(2019, 9, 2, 11, 48, 17),
# datetime.datetime(2019, 9, 2, 12, 28, 17)),
# (datetime.datetime(2019, 9, 2, 13, 4, 28),
# datetime.datetime(2019, 9, 2, 13, 12, 28)),
# (datetime.datetime(2019, 9, 2, 15, 13, 3),
# datetime.datetime(2019, 9, 2, 15, 40, 3)),
# (datetime.datetime(2019, 9, 2, 16, 6, 44),
# datetime.datetime(2019, 9, 2, 16, 24, 44)),
# (datetime.datetime(2019, 9, 2, 16, 37, 42),
# datetime.datetime(2019, 9, 2, 16, 40, 42)),
# (datetime.datetime(2019, 9, 2, 16, 42, 50),
# datetime.datetime(2019, 9, 2, 16, 46, 50))]

generate random dates within a range in numpy

There is a much easier way to achieve this, without needing to explicitly call any libraries beyond numpy.

Numpy has a datetime datatype that is quite powerful: specifically for this case you can add and subtract integers and it treats it like the smallest time unit available. for example, for a %Y-%m-%d format:

exampledatetime1 = np.datetime64('2017-01-01')
exampledatetime1 + 1
>>
2017-01-02

however, for a %Y-%m-%d %H:%M:%S format:

exampledatetime2 = np.datetime64('2017-01-01 00:00:00')
exampledatetime2 + 1
>>
2017-01-01 00:00:01

in this case, as you only have information down to a day resolution, you can simply do the following:

import numpy as np

bimonthly_days = np.arange(0, 60)
base_date = np.datetime64('2017-01-01')
random_date = base_date + np.random.choice(bimonthly_days)

or if you wanted to be even cleaner about it:

import numpy as np

def random_date_generator(start_date, range_in_days):
days_to_add = np.arange(0, range_in_days)
random_date = np.datetime64(start_date) + np.random.choice(days_to_add)
return random_date

and then just use:

yourdate = random_date_generator('2012-01-15', 60)

Generate list of months between interval in python

>>> from datetime import datetime, timedelta
>>> from collections import OrderedDict
>>> dates = ["2014-10-10", "2016-01-07"]
>>> start, end = [datetime.strptime(_, "%Y-%m-%d") for _ in dates]
>>> OrderedDict(((start + timedelta(_)).strftime(r"%b-%y"), None) for _ in xrange((end - start).days)).keys()
['Oct-14', 'Nov-14', 'Dec-14', 'Jan-15', 'Feb-15', 'Mar-15', 'Apr-15', 'May-15', 'Jun-15', 'Jul-15', 'Aug-15', 'Sep-15', 'Oct-15', 'Nov-15', 'Dec-15', 'Jan-16']

Update: a bit of explanation, as requested in one comment. There are three problems here: parsing the dates into appropriate data structures (strptime); getting the date range given the two extremes and the step (one month); formatting the output dates (strftime). The datetime type overloads the subtraction operator, so that end - start makes sense. The result is a timedelta object that represents the difference between the two dates, and the .days attribute gets this difference expressed in days. There is no .months attribute, so we iterate one day at a time and convert the dates to the desired output format. This yields a lot of duplicates, which the OrderedDict removes while keeping the items in the right order.

Now this is simple and concise because it lets the datetime module do all the work, but it's also horribly inefficient. We're calling a lot of methods for each day while we only need to output months. If performance is not an issue, the above code will be just fine. Otherwise, we'll have to work a bit more. Let's compare the above implementation with a more efficient one:

from datetime import datetime, timedelta
from collections import OrderedDict

dates = ["2014-10-10", "2016-01-07"]

def monthlist_short(dates):
start, end = [datetime.strptime(_, "%Y-%m-%d") for _ in dates]
return OrderedDict(((start + timedelta(_)).strftime(r"%b-%y"), None) for _ in xrange((end - start).days)).keys()

def monthlist_fast(dates):
start, end = [datetime.strptime(_, "%Y-%m-%d") for _ in dates]
total_months = lambda dt: dt.month + 12 * dt.year
mlist = []
for tot_m in xrange(total_months(start)-1, total_months(end)):
y, m = divmod(tot_m, 12)
mlist.append(datetime(y, m+1, 1).strftime("%b-%y"))
return mlist

assert monthlist_fast(dates) == monthlist_short(dates)

if __name__ == "__main__":
from timeit import Timer
for func in "monthlist_short", "monthlist_fast":
print func, Timer("%s(dates)" % func, "from __main__ import dates, %s" % func).timeit(1000)

On my laptop, I get the following output:

monthlist_short 2.3209939003
monthlist_fast 0.0774540901184

The concise implementation is about 30 times slower, so I would not recommend it in time-critical applications :)



Related Topics



Leave a reply



Submit