Iterating through a range of dates in Python
Why are there two nested iterations? For me it produces the same list of data with only one iteration:
for single_date in (start_date + timedelta(n) for n in range(day_count)):
print ...
And no list gets stored, only one generator is iterated over. Also the "if" in the generator seems to be unnecessary.
After all, a linear sequence should only require one iterator, not two.
Update after discussion with John Machin:
Maybe the most elegant solution is using a generator function to completely hide/abstract the iteration over the range of dates:
from datetime import date, timedelta
def daterange(start_date, end_date):
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
start_date = date(2013, 1, 1)
end_date = date(2015, 6, 2)
for single_date in daterange(start_date, end_date):
print(single_date.strftime("%Y-%m-%d"))
NB: For consistency with the built-in range()
function this iteration stops before reaching the end_date
. So for inclusive iteration use the next day, as you would with range()
.
Iterate through range of dates and find the sum in each group
df["group_1"] = (df["Date"] >= pd.Timestamp("2019-01-01")) & (
df["Date"] <= pd.Timestamp("2019-04-01")
)
df["group_2"] = (df["Date"] >= pd.Timestamp("2019-02-01")) & (
df["Date"] <= pd.Timestamp("2019-05-01")
)
g1 = df[df["group_1"] == True].groupby("ID")
g2 = df[df["group_2"] == True].groupby("ID")
df = pd.concat(
[
g1.agg({"Volume": "sum", "Sales": "sum"}),
g2.agg({"Volume": "sum", "Sales": "sum"}),
]
).sort_index()
print(df)
Prints:
Volume Sales
ID
1 12.666666 5.999999
1 12.000000 6.666666
2 7.333333 11.333334
2 6.666666 12.000000
If you want Date
column:
...
df = pd.concat(
[
g1.agg({"Volume": "sum", "Sales": "sum"}).assign(
Date="2019-01-01 to 2019-04-01"
),
g2.agg({"Volume": "sum", "Sales": "sum"}).assign(
Date="2019-02-01 to 2019-05-01"
),
]
).sort_index()
Prints:
Volume Sales Date
ID
1 12.666666 5.999999 2019-01-01 to 2019-04-01
1 12.000000 6.666666 2019-02-01 to 2019-05-01
2 7.333333 11.333334 2019-01-01 to 2019-04-01
2 6.666666 12.000000 2019-02-01 to 2019-05-01
EDIT: To generalize:
df["Date"] = pd.to_datetime(df["Date"])
# add dates to this group:
groups = [
[pd.Timestamp("2019-01-01"), pd.Timestamp("2019-04-01")],
[pd.Timestamp("2019-02-01"), pd.Timestamp("2019-05-01")],
[pd.Timestamp("2019-03-01"), pd.Timestamp("2019-06-01")],
]
grouped = []
for i, (t1, t2) in enumerate(groups, 1):
df["group_{}".format(i)] = (df["Date"] >= t1) & (df["Date"] <= t2)
grouped.append(
df[df["group_{}".format(i)] == True]
.groupby("ID")
.agg({"Volume": "sum", "Sales": "sum"})
.assign(Date="{} to {}".format(t1.date(), t2.date()))
)
df = pd.concat(grouped).sort_index()
print(df)
Prints:
Volume Sales Date
ID
1 12.666666 5.999999 2019-01-01 to 2019-04-01
1 12.000000 6.666666 2019-02-01 to 2019-05-01
1 11.333334 7.333333 2019-03-01 to 2019-06-01
2 7.333333 11.333334 2019-01-01 to 2019-04-01
2 6.666666 12.000000 2019-02-01 to 2019-05-01
2 5.999999 12.666666 2019-03-01 to 2019-06-01
Looping through a data range to download data from an API
If you first define your date ranges, you will be able to iterate through each 833 day period to pull data using the API. You'll then need to append the data to the dataframe (or csv) for each iteration.
import datetime as dt
# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive
# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
temp_end_date = temp_start_date + delta
if temp_end_date > end_date:
temp_end_date = end_date
date_ranges.append([temp_start_date, temp_end_date])
temp_start_date = temp_end_date + dt.timedelta(days=1)
# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
start_date_str = start_date.strftime("%Y-%m-%d")
end_date_str = end_date.strftime("%Y-%m-%d")
# Input to API with start and end dates in correct string format
# Process data into dataframe
There should be no need to count 833 days, as you said the API takes the start and end dates as arguments, hence you just need to find those for each date range.
Range of dates and go through each date at a certain hour
This is what I ended up doing:
beg_date = datetime(2019, 8, 1)
end_date = datetime(2020, 3, 31)
residents_in_range = Checkin.objects.filter(desk__name="Robinson Hall", datetime__gte=beg_date, datetime__lte=end_date)
guests_in_range = Guest.objects.filter(desk__name="Robinson Hall", datetime__gte=beg_date, datetime__lte=end_date)
rez_by_day = []
guests_by_day = []
one_day = timedelta(days=1)
while beg_date <= end_date:
rez_by_day.append(residents_in_range.filter(datetime__gte=beg_date, datetime__lt=beg_date+one_day))
guests_by_day.append(guests_in_range_in_range.filter(datetime__gte=beg_date, datetime__lt=beg_date+one_day))
beg_date += one_day
four = []
five = []
for day in rez_by_day:
four.append(day.filter(datetime__hour__gte=4, datetime__hour__lt=5))
five.append(day.filter(datetime__hour__gte=5, datetime__hour__lt=6))
for count in four:
print(count.count())
for count in five:
print(count.count())
I had it just print the numbers so I could copy and paste into an excel file and I just separated the information in lists further and further reducing and getting the information I want
I tried to fill out the code more but what I did was replace the halls and guests/residents list every time I wanted different data. I didn't do it in one go-around.
Python: iterate per year between two dates
from_str = '2020-10-01'
end_str = '2022-01-03'
from_year = int(from_str[:4])
end_year = int(end_str[:4])
if from_year != end_year:
# from_date to end of first year
extract(from_str, f"{from_year}-12-31")
# full years
for y in range(from_year + 1, end_year):
extract(f"{y}-01-01", f"{y}-12-31")
# rest
extract(f"{end_year}-01-01", end_str)
else:
extract(from_str, end_str)
Iterating through a range of dates in Python with missing dates
I'll create a sample dataset with 40 dates and 40 sample returns, then sample 90 percent of that randomly to simulate the missing dates.
The key here is that you need to convert your date
column into datetime if it isn't already, and make sure your df is sorted by the date.
Then you can groupby year/week and take the last value. If you run this repeatedly you'll see that the selected dates can change if the value dropped was the last day of the week.
Based on that
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)
# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)
# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])
# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')
df = df.groupby([df['date'].dt.isocalendar().year,
df['date'].dt.isocalendar().week], as_index=False).last()
print(df)
Output
date return
0 2022-04-24 0.299958
1 2022-05-01 0.248471
2 2022-05-08 0.506919
3 2022-05-15 0.541929
4 2022-05-22 0.588768
5 2022-05-27 0.504419
Iterating through a daterange in python
A good place to start are datetime
, date
and timedelta
objects docs.
First, let's construct our starting date and ending date (today):
>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))
Now let's define the unit of increment -- one day:
>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)
A nice thing about dates (date
/datetime
) and time deltas (timedelta
) is they and can be added:
>>> start + day
datetime.date(2015, 9, 10)
We can also use format()
to get that date in a human-readable form:
>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'
So, when we put all this together:
from datetime import date, timedelta
start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)
mydate = start
while mydate < end:
print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
mydate += week
we get a simple iteration over dates starting with 2015-09-09
and ending with today, incremented by 7 days (a week):
09092015
16092015
23092015
30092015
07102015
...
Iterate through date range - monthwise
You can use month periods for Series.dt.to_period
for groups by months:
months = df['opened_at'].dt.to_period('m')
for month, g in df.groupby(months):
print (g)
Iterating over date in python
Well it depends on how you wish to iterate. By days? by months? Using timedelta
will solve your problem.
from datetime import datetime
start_date = "2011-01-01"
stop_date = "2013-05-01"
start = datetime.strptime(start_date, "%Y-%m-%d")
stop = datetime.strptime(stop_date, "%Y-%m-%d")
from datetime import timedelta
while start < stop:
start = start + timedelta(days=1) # increase day one by one
Another approach to itearete through months is using relativedelta
from dateutil.relativedelta import relativedelta
start = start + relativedelta(months = +1)
Related Topics
Deleting Dataframe Row in Pandas Based on Column Value
How to Install Pip With Python 3
Does Python Optimize Tail Recursion
Pygame - How to Display Text With Font & Color
Apply Multiple Functions to Multiple Groupby Columns
Should Import Statements Always Be At the Top of a Module
Py2Exe - Generate Single Executable File
How to Use a Variable Inside a Regular Expression
Random String Generation With Upper Case Letters and Digits
Save Plot to Image File Instead of Displaying It Using Matplotlib
How to Replace Nan Values by Zeroes in a Column of a Pandas Dataframe
How to Calculate Number of Days Between Two Given Dates
How to Install Pip on Macos or Os X
Difference Between Numpy.Array Shape (R, 1) and (R,)
Determine the Type of an Object
How to Get the Last Element of a List