Generating Random Dates Within a Given Range in Pandas

Generating random dates within a given range in pandas

We can speed up @akilat90's approach about twofold (in @coldspeed's benchmark) by using the fact that datetime64 is just a rebranded int64 hence we can view-cast:

def pp(start, end, n):
start_u = start.value//10**9
end_u = end.value//10**9

return pd.DatetimeIndex((10**9*np.random.randint(start_u, end_u, n, dtype=np.int64)).view('M8[ns]'))

Sample Image

How to generate random dates between date range inside pandas column?

I would figure out how many days are in your date range, then select 88799 random integers in that range, and finally add that as a timedelta with unit='d' to your minimum date:

min_date = pd.to_datetime('1960-01-01')
max_date = pd.to_datetime('1990-12-31')

d = (max_date - min_date).days + 1

df['dob'] = min_date + pd.to_timedelta(pd.np.random.randint(d,size=88799), unit='d')

>>> df.head()
dob
0 1963-03-05
1 1973-06-07
2 1970-08-24
3 1970-05-03
4 1971-07-03

>>> df.tail()
dob
88794 1965-12-10
88795 1968-08-09
88796 1988-04-29
88797 1971-07-27
88798 1980-08-03

EDIT You can format your dates using .strftime('%m/%d/%Y'), but note that this will slow down the execution significantly:

df['dob'] = (min_date + pd.to_timedelta(pd.np.random.randint(d,size=88799), unit='d')).strftime('%m/%d/%Y')

>>> df.head()
dob
0 02/26/1969
1 04/09/1963
2 08/29/1984
3 02/12/1961
4 08/02/1988
>>> df.tail()
dob
88794 02/13/1968
88795 02/05/1982
88796 07/03/1964
88797 06/11/1976
88798 11/17/1965

Generate a random date between two other dates

Convert both strings to timestamps (in your chosen resolution, e.g. milliseconds, seconds, hours, days, whatever), subtract the earlier from the later, multiply your random number (assuming it is distributed in the range [0, 1]) with that difference, and add again to the earlier one. Convert the timestamp back to date string and you have a random time in that range.

Python example (output is almost in the format you specified, other than 0 padding - blame the American time format conventions):

import random
import time

def str_time_prop(start, end, time_format, prop):
"""Get a time at a proportion of a range of two formatted times.

start and end should be strings specifying times formatted in the
given format (strftime-style), giving an interval [start, end].
prop specifies how a proportion of the interval to be taken after
start. The returned time will be in the specified format.
"""

stime = time.mktime(time.strptime(start, time_format))
etime = time.mktime(time.strptime(end, time_format))

ptime = stime + prop * (etime - stime)

return time.strftime(time_format, time.localtime(ptime))

def random_date(start, end, prop):
return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

Random date based on certain range in pandas

First you define a function date_range that takes the start date and end dates and the size of the sample and returns a sample.

import pandas as pd
df = pd.DataFrame({'client':['123AASD45', '2345OPU78', '763LKJ90'], 'frequency':[10,9,2]})

def date_range(n, start='1/1/2011', end='4/1/2011'):
date_range = pd.date_range(start, end)
return list(pd.Series(date_range).sample(n))

Then for each client you assign the sample of dates and do some data reshape to so you can join with the original table.

df['dates'] = df['frequency'].apply(lambda x: date_range(x))
df_dates = df['dates'].apply(pd.Series).reset_index()
df_dates = df_dates.melt(id_vars='index').dropna().drop(['variable'], axis=1).set_index('index')

Finally you join on the original dataset assuming there is one row per client.

df.join(df_dates)

add random dates in 400K pandas dataframe

Without the faker package, you can do this:

import numpy as np
import pandas as pd

x["Fake_date"] = np.random.choice(pd.date_range('1980-01-01', '2000-01-01'), len(x))

>>> x
0 1 2 Fake_date
0 228055 231908 1 1999-12-08
1 228056 228899 1 1989-01-25

replacing the 2 date strings in pd.date_range() with the minimum and maximum date that you want to choose random dates from

Creating Random sorted dates from date range in Pandas

I would define a custom function, and apply it with groupby().transform. The function does the following:

1: gets the number of days between the start and end days

2: gets n random integers (n is the size of each group, and the integers represent number of days after the start day), and sorts them

3: adds those random integers as a timedelta of days to the start date.

start_date = pd.to_datetime('2018-06-30')
end_date = pd.to_datetime('2018-11-30')

def gen_rand_date(group, start_date = start_date, end_date = end_date):
# step 1 in description above:
days = (end_date - start_date).days
# step 2:
d = pd.np.sort(pd.np.random.choice(range(days), len(group)))
# step 3:
return start_date + pd.to_timedelta(d,unit='D')

df['Transaction_Date'] = df.groupby('Account_ID').transform(lambda x: gen_rand_date(x))

>>> df
Account_ID Transaction_Type Transaction_Date
0 10001 B 2018-07-19
1 10001 B 2018-08-12
2 10001 B 2018-08-27
3 10001 B 2018-09-29
4 10002 D 2018-10-23
5 10002 D 2018-11-09
6 10002 D 2018-11-14
7 10003 F 2018-08-03
8 10003 F 2018-09-10
9 10004 H 2018-09-16

generate random dates within a range in numpy

There is a much easier way to achieve this, without needing to explicitly call any libraries beyond numpy.

Numpy has a datetime datatype that is quite powerful: specifically for this case you can add and subtract integers and it treats it like the smallest time unit available. for example, for a %Y-%m-%d format:

exampledatetime1 = np.datetime64('2017-01-01')
exampledatetime1 + 1
>>
2017-01-02

however, for a %Y-%m-%d %H:%M:%S format:

exampledatetime2 = np.datetime64('2017-01-01 00:00:00')
exampledatetime2 + 1
>>
2017-01-01 00:00:01

in this case, as you only have information down to a day resolution, you can simply do the following:

import numpy as np

bimonthly_days = np.arange(0, 60)
base_date = np.datetime64('2017-01-01')
random_date = base_date + np.random.choice(bimonthly_days)

or if you wanted to be even cleaner about it:

import numpy as np

def random_date_generator(start_date, range_in_days):
days_to_add = np.arange(0, range_in_days)
random_date = np.datetime64(start_date) + np.random.choice(days_to_add)
return random_date

and then just use:

yourdate = random_date_generator('2012-01-15', 60)

Python generating a list of dates between two dates

You can use pandas.date_range() for this:

import pandas
pandas.date_range(sdate,edate-timedelta(days=1),freq='d')


DatetimeIndex(['2019-03-22', '2019-03-23', '2019-03-24', '2019-03-25',
'2019-03-26', '2019-03-27', '2019-03-28', '2019-03-29',
'2019-03-30', '2019-03-31', '2019-04-01', '2019-04-02',
'2019-04-03', '2019-04-04', '2019-04-05', '2019-04-06',
'2019-04-07', '2019-04-08'],
dtype='datetime64[ns]', freq='D')


Related Topics



Leave a reply



Submit