Iterating Through a Range of Dates in Python

Iterating through a range of dates in Python

Why are there two nested iterations? For me it produces the same list of data with only one iteration:

for single_date in (start_date + timedelta(n) for n in range(day_count)):
print ...

And no list gets stored, only one generator is iterated over. Also the "if" in the generator seems to be unnecessary.

After all, a linear sequence should only require one iterator, not two.

Update after discussion with John Machin:

Maybe the most elegant solution is using a generator function to completely hide/abstract the iteration over the range of dates:

from datetime import date, timedelta

def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)

start_date = date(2013, 1, 1)
end_date = date(2015, 6, 2)
for single_date in daterange(start_date, end_date):
print(single_date.strftime("%Y-%m-%d"))

NB: For consistency with the built-in range() function this iteration stops before reaching the end_date. So for inclusive iteration use the next day, as you would with range().

Iterate through range of dates and find the sum in each group

df["group_1"] = (df["Date"] >= pd.Timestamp("2019-01-01")) & (
df["Date"] <= pd.Timestamp("2019-04-01")
)
df["group_2"] = (df["Date"] >= pd.Timestamp("2019-02-01")) & (
df["Date"] <= pd.Timestamp("2019-05-01")
)

g1 = df[df["group_1"] == True].groupby("ID")
g2 = df[df["group_2"] == True].groupby("ID")

df = pd.concat(
[
g1.agg({"Volume": "sum", "Sales": "sum"}),
g2.agg({"Volume": "sum", "Sales": "sum"}),
]
).sort_index()

print(df)

Prints:

       Volume      Sales
ID
1 12.666666 5.999999
1 12.000000 6.666666
2 7.333333 11.333334
2 6.666666 12.000000

If you want Date column:

...

df = pd.concat(
[
g1.agg({"Volume": "sum", "Sales": "sum"}).assign(
Date="2019-01-01 to 2019-04-01"
),
g2.agg({"Volume": "sum", "Sales": "sum"}).assign(
Date="2019-02-01 to 2019-05-01"
),
]
).sort_index()

Prints:

       Volume      Sales                      Date
ID
1 12.666666 5.999999 2019-01-01 to 2019-04-01
1 12.000000 6.666666 2019-02-01 to 2019-05-01
2 7.333333 11.333334 2019-01-01 to 2019-04-01
2 6.666666 12.000000 2019-02-01 to 2019-05-01

EDIT: To generalize:

df["Date"] = pd.to_datetime(df["Date"])

# add dates to this group:
groups = [
[pd.Timestamp("2019-01-01"), pd.Timestamp("2019-04-01")],
[pd.Timestamp("2019-02-01"), pd.Timestamp("2019-05-01")],
[pd.Timestamp("2019-03-01"), pd.Timestamp("2019-06-01")],
]

grouped = []
for i, (t1, t2) in enumerate(groups, 1):
df["group_{}".format(i)] = (df["Date"] >= t1) & (df["Date"] <= t2)
grouped.append(
df[df["group_{}".format(i)] == True]
.groupby("ID")
.agg({"Volume": "sum", "Sales": "sum"})
.assign(Date="{} to {}".format(t1.date(), t2.date()))
)

df = pd.concat(grouped).sort_index()

print(df)

Prints:

       Volume      Sales                      Date
ID
1 12.666666 5.999999 2019-01-01 to 2019-04-01
1 12.000000 6.666666 2019-02-01 to 2019-05-01
1 11.333334 7.333333 2019-03-01 to 2019-06-01
2 7.333333 11.333334 2019-01-01 to 2019-04-01
2 6.666666 12.000000 2019-02-01 to 2019-05-01
2 5.999999 12.666666 2019-03-01 to 2019-06-01

Looping through a data range to download data from an API

If you first define your date ranges, you will be able to iterate through each 833 day period to pull data using the API. You'll then need to append the data to the dataframe (or csv) for each iteration.

import datetime as dt

# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive

# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
temp_end_date = temp_start_date + delta
if temp_end_date > end_date:
temp_end_date = end_date
date_ranges.append([temp_start_date, temp_end_date])
temp_start_date = temp_end_date + dt.timedelta(days=1)

# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")

# Input to API with start and end dates in correct string format

# Process data into dataframe

There should be no need to count 833 days, as you said the API takes the start and end dates as arguments, hence you just need to find those for each date range.

Range of dates and go through each date at a certain hour

This is what I ended up doing:

beg_date = datetime(2019, 8, 1)
end_date = datetime(2020, 3, 31)
residents_in_range = Checkin.objects.filter(desk__name="Robinson Hall", datetime__gte=beg_date, datetime__lte=end_date)
guests_in_range = Guest.objects.filter(desk__name="Robinson Hall", datetime__gte=beg_date, datetime__lte=end_date)
rez_by_day = []
guests_by_day = []
one_day = timedelta(days=1)
while beg_date <= end_date:
rez_by_day.append(residents_in_range.filter(datetime__gte=beg_date, datetime__lt=beg_date+one_day))
guests_by_day.append(guests_in_range_in_range.filter(datetime__gte=beg_date, datetime__lt=beg_date+one_day))
beg_date += one_day
four = []
five = []
for day in rez_by_day:
four.append(day.filter(datetime__hour__gte=4, datetime__hour__lt=5))
five.append(day.filter(datetime__hour__gte=5, datetime__hour__lt=6))
for count in four:
print(count.count())
for count in five:
print(count.count())

I had it just print the numbers so I could copy and paste into an excel file and I just separated the information in lists further and further reducing and getting the information I want

I tried to fill out the code more but what I did was replace the halls and guests/residents list every time I wanted different data. I didn't do it in one go-around.

Python: iterate per year between two dates

from_str = '2020-10-01'
end_str = '2022-01-03'

from_year = int(from_str[:4])
end_year = int(end_str[:4])

if from_year != end_year:
# from_date to end of first year
extract(from_str, f"{from_year}-12-31")

# full years
for y in range(from_year + 1, end_year):
extract(f"{y}-01-01", f"{y}-12-31")

# rest
extract(f"{end_year}-01-01", end_str)
else:
extract(from_str, end_str)

Iterating through a range of dates in Python with missing dates

I'll create a sample dataset with 40 dates and 40 sample returns, then sample 90 percent of that randomly to simulate the missing dates.

The key here is that you need to convert your date column into datetime if it isn't already, and make sure your df is sorted by the date.

Then you can groupby year/week and take the last value. If you run this repeatedly you'll see that the selected dates can change if the value dropped was the last day of the week.

Based on that

import pandas as pd
import numpy as np

df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)

# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)

# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])

# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')

df = df.groupby([df['date'].dt.isocalendar().year,
df['date'].dt.isocalendar().week], as_index=False).last()

print(df)

Output

       date    return
0 2022-04-24 0.299958
1 2022-05-01 0.248471
2 2022-05-08 0.506919
3 2022-05-15 0.541929
4 2022-05-22 0.588768
5 2022-05-27 0.504419

Iterating through a daterange in python

A good place to start are datetime, date and timedelta objects docs.

First, let's construct our starting date and ending date (today):

>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))

Now let's define the unit of increment -- one day:

>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)

A nice thing about dates (date/datetime) and time deltas (timedelta) is they and can be added:

>>> start + day
datetime.date(2015, 9, 10)

We can also use format() to get that date in a human-readable form:

>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'

So, when we put all this together:

from datetime import date, timedelta

start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)

mydate = start
while mydate < end:
print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
mydate += week

we get a simple iteration over dates starting with 2015-09-09 and ending with today, incremented by 7 days (a week):

09092015
16092015
23092015
30092015
07102015
...

Iterate through date range - monthwise

You can use month periods for Series.dt.to_period for groups by months:

months = df['opened_at'].dt.to_period('m')

for month, g in df.groupby(months):
print (g)

Iterating over date in python

Well it depends on how you wish to iterate. By days? by months? Using timedelta will solve your problem.

from datetime import datetime

start_date = "2011-01-01"
stop_date = "2013-05-01"

start = datetime.strptime(start_date, "%Y-%m-%d")
stop = datetime.strptime(stop_date, "%Y-%m-%d")

from datetime import timedelta
while start < stop:
start = start + timedelta(days=1) # increase day one by one

Another approach to itearete through months is using relativedelta

from dateutil.relativedelta import relativedelta
start = start + relativedelta(months = +1)


Related Topics



Leave a reply



Submit