Can Pandas Plot a Histogram of Dates

Can Pandas plot a histogram of dates?

Given this df:

        date
0 2001-08-10
1 2002-08-31
2 2003-08-29
3 2006-06-21
4 2002-03-27
5 2003-07-14
6 2004-06-15
7 2003-08-14
8 2003-07-29

and, if it's not already the case:

df["date"] = df["date"].astype("datetime64")

To show the count of dates by month:

df.groupby(df["date"].dt.month).count().plot(kind="bar")

.dt allows you to access the datetime properties.

Which will give you:

groupby date month

You can replace month by year, day, etc..

If you want to distinguish year and month for instance, just do:

df.groupby([df["date"].dt.year, df["date"].dt.month]).count().plot(kind="bar")

Which gives:

groupby date month year

Date histogram per minutes in pandas

I think you are looking for pandas Grouper.

It allows you to specify any frequency or interval needed.

Here is a working example with 10 minutes interval :

import pandas as pd
df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
df.columns = ['smdate', 'smtime']

df['smtime'] = pd.to_datetime(df['smtime'])

df.groupby(pd.Grouper(key='smtime', freq='10Min')).count().plot(kind="bar",figsize=(50,10))

Here, I kept the initial dataframe structure ; I couldn't get it to work with the datetime Series object (Grouper function tries to work on index and not values of the serie). I tried axis parameter without success. I would be glad if anyone could improve my answer working directly with the Series.

Not working example :

import pandas as pd
df = pd.read_csv('mydata.csv',sep=';',usecols=[0,1])
df.columns = ['smdate', 'smtime']

df = pd.to_datetime(df['smtime'])

df.groupby(pd.Grouper(freq='10Min')).count().plot(kind="bar",figsize=(50,10))

Python / Matplotlib -- Histogram of Dates by Day of Year

Try to check this code:

# import section
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
from datetime import date
from itertools import product

# generate a dataframe like yours
date = [date(2020, m, d).strftime("%B %d") for m, d in product(range(1, 13, 1), range(1, 29, 1))]
value = np.abs(np.random.randn(len(date)))
data = pd.DataFrame({'date': date,
'value': value})
data.set_index('date', inplace = True)

# convert index from str to date
data.index = pd.to_datetime(data.index, format = '%B %d')

# plot
fig, ax = plt.subplots(1, 1, figsize = (16, 8))
ax.bar(data.index,
data['value'])

# formatting xaxis
ax.xaxis.set_major_locator(md.DayLocator(interval = 5))
ax.xaxis.set_major_formatter(md.DateFormatter('%B %d'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 90)
ax.set_xlim([data.index[0], data.index[-1]])

plt.show()

that gives me this plot:

Sample Image

I converted the index of the dataframe from string to date, then I applied the xaxis format that I want through ax.xaxis.set_major_locator and ax.xaxis.set_major_formatter methods.

In order to plot that I used matplotlib, but it should not be difficult to translate this approach to pylab.


EDIT

If you want days and months of separate ticks, you can add a secondary axis (check this example) as in this code:

# import section
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
from datetime import date
from itertools import product
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA

# generate a dataframe like yours
date = [date(2020, m, d).strftime("%B %d") for m, d in product(range(1, 13, 1), range(1, 29, 1))]
value = np.abs(np.random.randn(len(date)))
data = pd.DataFrame({'date': date,
'value': value})
data.set_index('date', inplace = True)

# convert index from str to date
data.index = pd.to_datetime(data.index, format = '%B %d')

# prepare days and months axes
fig = plt.figure(figsize = (16, 8))
days = host_subplot(111, axes_class = AA.Axes, figure = fig)
plt.subplots_adjust(bottom = 0.1)
months = days.twiny()

# position months axis
offset = -20
new_fixed_axis = months.get_grid_helper().new_fixed_axis
months.axis['bottom'] = new_fixed_axis(loc = 'bottom',
axes = months,
offset = (0, offset))
months.axis['bottom'].toggle(all = True)

#plot
days.bar(data.index, data['value'])

# formatting days axis
days.xaxis.set_major_locator(md.DayLocator(interval = 10))
days.xaxis.set_major_formatter(md.DateFormatter('%d'))
plt.setp(days.xaxis.get_majorticklabels(), rotation = 0)
days.set_xlim([data.index[0], data.index[-1]])

# formatting months axis
months.xaxis.set_major_locator(md.MonthLocator())
months.xaxis.set_major_formatter(md.DateFormatter('%b'))
months.set_xlim([data.index[0], data.index[-1]])

plt.show()

which produces this plot:

Sample Image

How to make a histogram of pandas datetimes per specific time interval?

pd.Grouper

Allows you to specify regular frequency intervals with which you will group your data. Use groupby to then aggregate your df based on these groups. For instance, if col2 was counts and you wanted to bin together all of the counts over 2 day intervals, you could do:

import pandas as pd
df.groupby(pd.Grouper(level=0, freq='2D')).col2.sum()

Outputs:

test
2018-06-19 13:49:11.560185 85
2018-06-21 13:49:11.560185 95
2018-06-23 13:49:11.560185 88
2018-06-25 13:49:11.560185 48
Name: col2, dtype: int32

You group by level=0, that is your index labeled 'test' and sum col2 over 2 day bins. The behavior of pd.Grouper can be a little annoying since in this example the bins start and end at 13:49:11..., which likely isn't what you want.

pd.cut + pd.date_range

You have a bit more control over defining your bins if you define them with pd.date_range and then use pd.cut. Here for instance, you can define bins every 2 days beginning on the 19th.

df.groupby(pd.cut(df.index, 
pd.date_range('2018-06-19', '2018-06-27', freq='2D'))).col2.sum()

Outputs:

(2018-06-19, 2018-06-21]    85
(2018-06-21, 2018-06-23] 95
(2018-06-23, 2018-06-25] 88
(2018-06-25, 2018-06-27] 48
Name: col2, dtype: int32

This is nice, because if you instead wanted the bins to begin on even days you can just change the start and end dates in pd.date_range

df.groupby(pd.cut(df.index, 
pd.date_range('2018-06-18', '2018-06-28', freq='2D'))).col2.sum()

Outputs:

(2018-06-18, 2018-06-20]     29
(2018-06-20, 2018-06-22] 138
(2018-06-22, 2018-06-24] 48
(2018-06-24, 2018-06-26] 78
(2018-06-26, 2018-06-28] 23
Name: col2, dtype: int32

If you really wanted to, you could specify 2.6 hour bins beginning on June 19th 2018 at 5 AM:

df.groupby(pd.cut(df.index, 
pd.date_range('2018-06-19 5:00:00', '2018-06-28 5:00:00', freq='2.6H'))).col2.sum()
#(2018-06-19 05:00:00, 2018-06-19 07:36:00] 0
#(2018-06-19 07:36:00, 2018-06-19 10:12:00] 0
#(2018-06-19 10:12:00, 2018-06-19 12:48:00] 0
#(2018-06-19 12:48:00, 2018-06-19 15:24:00] 29
#....

Histogram.

Just use .plot(kind='bar') after you have aggregated the data.

(df.groupby(pd.cut(df.index, 
pd.date_range('2018-06-19', '2018-06-28', freq='2D')))
.col2.sum().plot(kind='bar', color='firebrick', rot=30))

Sample Image

Simplest histogram with dates as x-axis in matplotlib

You probably want a bar graph.

import datetime
import matplotlib
matplotlib.use('agg') # server no need to display graphics
import matplotlib.pyplot as plt

# x-axis is 3 consecutive dates (days)
now = datetime.datetime.now().date()
x = [now, now + datetime.timedelta(days=1), now + datetime.timedelta(days=2)]

# y1-axis is 3 numbers
y1 = [10, 0, 3]
y2 = [8, 0, 3]

fig, axarr = plt.subplots(2, sharex=True)
axarr[1].bar(x, y1, edgecolor="k")
axarr[1].set_xticks(x)
axarr[1].set_xticklabels(x)

plt.savefig('a.png', bbox_inches='tight')

Sample Image

How can I draw the histogram of date values group by month in each year in Python?

According to the docstring the groupby docstring, the by parameter is:

list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

So your code simply becomes:

df = pd.read_csv(...)
df['date'] = df['date'].astype("datetime64")
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df.groupby(by=['month', 'year']).count().plot(kind="bar")

But I would write this as:

ax = (
pandas.read_csv(...)
.assign(date=lambda df: df['date'].astype("datetime64"))
.assign(year=lambda df: df['date'].dt.year)
.assign(month=lambda df: df['date'].dt.month)
.groupby(by=['year', 'month'])
.count()
.plot(kind="bar")
)

And now you have a matplotlib axes object that you can use to modify the tick labels (e.g., matplotlib x-axis ticks dates formatting and locations)



Related Topics



Leave a reply



Submit