Understanding Dates and Plotting a Histogram with Ggplot2 in R

How to properly plot a histogram with dates using ggplot?

Presently, feeding as.character(fechas) to the text = ... argument inside of aes() will display the relative counts of distinct dates within each bin. Note the height of the first bar is simply a count of the total number of dates between 6th of January and the 13th of January.

After a thorough reading of your question, it appears you want the maximum date within each weekly interval. In other words, one date should hover over each bar. If you're partial to converting ggplot objects into plotly objects, then I would advise pre-processing the data frame before feeding it to the ggplot() function. First, group by week. Second, pull the desired date by each weekly interval to show as text (i.e., end date). Next, feed this new data frame to ggplot(), but now layer on geom_col(). This will achieve similar output since you're grouping by weekly intervals.

library(dplyr)
library(lubridate)
library(ggplot2)
library(plotly)

set.seed(13)
Ejemplo <- data.frame(fechas = dmy("1-1-20") + sample(1:100, 100, replace = T),
valores = runif(100))

Ejemplo_stat <- Ejemplo %>%
arrange(fechas) %>%
filter(fechas >= ymd("2020-01-01"), fechas <= ymd("2020-04-01")) %>% # specify the limits manually
mutate(week = week(fechas)) %>% # create a week variable
group_by(week) %>% # group by week
summarize(total_days = n(), # total number of distinct days
last_date = max(fechas)) # pull the maximum date within each weekly interval

dibujo <- ggplot(Ejemplo_stat, aes(x = factor(week), y = total_days, text = as.character(last_date))) +
geom_col(fill = "darkblue", color = "black") +
labs(x = "Fecha", y = "Nº casos") +
theme_bw() +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
scale_x_discrete(label = function(x) paste("Week", x))

ggplotly(dibujo) # add more text (e.g., week id, total unique dates, and end date)
ggplotly(dibujo, tooltip = "text") # only the end date is revealed

plotly end date

The "end date" is displayed once you hover over each bar, as requested. Note, the value "2020-01-12" is not the last day of the second week. It is the last date observed in the second weekly interval.

The benefit of the preprocessing approach is your ability to modify your grouped data frame, as needed. For example, feel free to limit the date range to a smaller (or larger) subset of weeks, or start your weeks on a different day of the week (e.g., Sunday). Furthermore, if you want more textual options to display, you could also display your total number of unique dates next to each bar, or even display the date ranges for each week.

Plotting a line graph by datetime with a histogram/bar graph by date

You can extend your data manipulation by:

df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()

then use midday for bars and datetime for line:

df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)

Sample Image

PS. Sorry for Polish locales on X axis.

PS2. With geom_bar it looks much better

Created on 2022-02-03 by the reprex package (v2.0.1)

R Plot Histogram On Dataframe with dates-time object

I converted Date to POSIXct objects, using lubridate's ymd_hms function.

library(ggplot2)
ggplot(df, aes(x=Date, y=Value)) +
geom_bar(stat="identity") +
scale_x_datetime(limits =c(mdy_hms("10/2/16 20:00:00"),mdy_hms("10/3/16 20:00:00")))

bar plot with 1-day scale

You get a clearer picture without the scale_x_datetime limits:

bar plot with no scale limits

Simply replace geom_bar with geom_line for a line graph:

ggplot(df, aes(x=Date, y=Value)) + 
geom_line()

line plot

Formatting histogram x-axis when working with dates using R

Since you effectively challenged us to provide a ggplot solution, here it is:

dates <- seq(as.Date("2011-10-01"), length.out=60, by="+1 day")

set.seed(1)
dat <- data.frame(
suburb <- rep(LETTERS[24:26], times=c(100, 200, 300)),
Date_of_Onset <- c(
sample(dates-30, 100, replace=TRUE),
sample(dates, 200, replace=TRUE),
sample(dates+30, 300, replace=TRUE)
)
)

library(scales)
library(ggplot2)
ggplot(dat, aes(x=Date_of_Onset, fill=suburb)) +
stat_bin(binwidth=1, position="identity") +
scale_x_date(breaks=date_breaks(width="1 month"))

Note the use of position="identity" to force each bar to originate on the axis, otherwise you get a stacked chart by default.

Sample Image

Plotting Variable over Date:Time Issue in ggplot


library(tidyverse)
df %>%
mutate(Date = as.Date(Date)) %>%
count(Date, wt = Breaks) %>%
ggplot(aes(Date, n)) +
geom_col(colour = "white", fill = "#1380A1")

Sample Image

(Not sure I'm understanding the comment about "But I need the missing values in the graph that represent (o) essentially." Should zeros be represented visually somehow? BTW, the part through the count(Date = ... line produces this -- is that what you meant by capturing the missing values?)

# A tibble: 5 x 2
Date n
<date> <dbl>
1 2018-10-26 2
2 2018-12-06 0
3 2018-12-20 0
4 2018-12-26 0
5 2018-12-28 1


Related Topics



Leave a reply



Submit