Ggplot2 Time Series Plotting: How to Omit Periods When There Is No Data Points

ggplot2 time series plotting: how to omit periods when there is no data points?

First, create a grouping variable. Here, two groups are different if the time difference is larger than 1 minute:

Group <- c(0, cumsum(diff(Time) > 1))

Now three distinct panels could be created using facet_grid and the argument scales = "free_x":

library(ggplot2)
g <- ggplot(data.frame(Time, Value, Group)) +
geom_line (aes(x=Time, y=Value)) +
facet_grid(~ Group, scales = "free_x")

Sample Image

r How to remove weekday after-hours periods (as well as weekends and holodays) from x-axis when plotting intraday data

I'm not sure there's an easy solution to change the breaks in the way you desire without converting your scale to a discrete one.

The disadvantage of doing this is that you lose the ability to set your breaks flexibly as in scale_x_datetime. To get around this I've given some examples of how to set up some handy breaks by altering your dataframe as in the example below. I've also converted the timestamp column into a character for use with the discrete scale.

I've made the assumption that, since you're grabbing data for market hours, that the market hours are already defined by the timestamp column in your data. This saves from defining a custom scale that excludes holidays etc.

# convert to character column and set up handy columns for making breaks
df <- df %>%
mutate(timestamp_chr = as.character(df$timestamp),
day = lubridate::day(timestamp),
hour = lubridate::hour(timestamp),
minute = lubridate::minute(timestamp),
new_day = if_else(day != lag(day) | is.na(lag(day)), 1, 0))

df

# # A tibble: 100 x 7
# timestamp p day hour minute timestamp_chr new_day
# <dttm> <dbl> <int> <int> <int> <chr> <dbl>
# 1 2019-07-16 14:15:00 300. 16 14 15 2019-07-16 14:15:00 1
# 2 2019-07-16 14:20:00 300. 16 14 20 2019-07-16 14:20:00 0
# 3 2019-07-16 14:25:00 300. 16 14 25 2019-07-16 14:25:00 0
# 4 2019-07-16 14:30:00 300. 16 14 30 2019-07-16 14:30:00 0
# 5 2019-07-16 14:35:00 300. 16 14 35 2019-07-16 14:35:00 0
# 6 2019-07-16 14:40:00 300. 16 14 40 2019-07-16 14:40:00 0
# 7 2019-07-16 14:45:00 300. 16 14 45 2019-07-16 14:45:00 0
# 8 2019-07-16 14:50:00 300. 16 14 50 2019-07-16 14:50:00 0
# 9 2019-07-16 14:55:00 300. 16 14 55 2019-07-16 14:55:00 0
# 10 2019-07-16 15:00:00 300. 16 15 0 2019-07-16 15:00:00 0
# # … with 90 more rows

# breaks equally spaced
my_breaks <-df$timestamp_chr[seq.int(1,length(df$timestamp_chr) , by = 10)]

ggplot(df, aes(x = timestamp_chr, y = p, group = 1)) +
geom_line() +
scale_x_discrete(breaks = my_breaks) +
theme(axis.text.x = element_text(angle = 90))

Sample Image

In the above example I've used equally spaced breaks but you could also specify, for example:

# breaks on the hour
my_breaks <- df[df$minute == 0,]$timestamp_chr

# breaks at start of each new day
my_breaks <- df[df$new_day == 1,]$timestamp_chr

Line break when no data in ggplot2

You'll have to set group by setting a common value to those points you'd like to be connected. Here, you can set the first 4 values to say 1 and the last 2 to 2. And keep them as factors. That is,

df1$grp <- factor(rep(1:2, c(4,2)))
g <- ggplot(df1, aes(x=Date, y=Counts)) + geom_line(aes(group = grp)) +
geom_point()

Edit: Once you have your data.frame loaded, you can use this code to automatically generate the grp column:

idx <- c(1, diff(df$Date))
i2 <- c(1,which(idx != 1), nrow(df)+1)
df1$grp <- rep(1:length(diff(i2)), diff(i2))

Note: It is important to add geom_point() as well because if the discontinuous range happens to be the LAST entry in the data.frame, it won't be plotted (as there are not 2 points to connect the line). In this case, geom_point() will plot it.

As an example, I'll generate a data with more gaps:

# get a test data
set.seed(1234)
df <- data.frame(Date=seq(as.POSIXct("05:00", format="%H:%M"),
as.POSIXct("23:00", format="%H:%M"), by="hours"))
df$Counts <- sample(19)
df <- df[-c(4,7,17,18),]

# generate the groups automatically and plot
idx <- c(1, diff(df$Date))
i2 <- c(1,which(idx != 1), nrow(df)+1)
df$grp <- rep(1:length(diff(i2)), diff(i2))
g <- ggplot(df, aes(x=Date, y=Counts)) + geom_line(aes(group = grp)) +
geom_point()
g

ggplot2_groups

Edit: For your NEW data (assuming it is df),

df$t <- strptime(paste(df$Date, df$Time), format="%d/%m/%Y %H:%M:%S")

idx <- c(10, diff(df$t))
i2 <- c(1,which(idx != 10), nrow(df)+1)
df$grp <- rep(1:length(diff(i2)), diff(i2))

now plot with aes(x=t, ...).

ggplot time series: messed up x axis for data with missing values

A remedy is to coerce month_1 to a factor and group the observations by year like so:

ggplot(data_to_plot, aes(x = as.factor(month_1), y = avg_dlt_calc, group = year)) +
geom_line(size = 0.5) +
scale_x_discrete(name = "months") +
facet_grid(~year, scales = "free")

Note that I've moved y = avg_dlt_calc inside aes() in ggplot() which is more idiomatic than your approach. You may use the breaks argument in scale_x_discrete() to set breaks manually, see ?scale_x_discrete.

Sample Image

I think a fixed x-axis and adding points is more suitable for conveying the information that data is only available for some periods:

ggplot(data_to_plot, aes(x = as.factor(month_1), y = avg_dlt_calc, group = year)) +
geom_line(size = 0.5) +
geom_point() +
scale_x_discrete(name = "months") +
facet_grid(~year, scales = "free_y")

Sample Image

Removing connetion lines between missing Dates in ggplot

One solution would be to specify the group aesthetics to match the groups you want to have connected by lines.

In your case this is the year:

ggplot(data = testsample, aes(x = Date, group = year(Date))) +
geom_line(aes(y = Value1), color = "black", size = 1, alpha=0.5) +
geom_line(aes(y = Value2), color = "red", size = 1, alpha=0.5) +
labs(subtitle="testplot",
x = "year",
y = "values") +
scale_x_date(date_labels = "%y", date_breaks ="1 year")

Sample Image

Building on Gregors comment we can also change implicit missing values to explicit missing values, e.g. using tidyr::complete:

testsample2 <- tidyr::complete(testsample, Date = seq(min(Date), max(Date), by = "day"))
ggplot(data = testsample2, aes(x = Date)) +
geom_line(aes(y = Value1), color = "black", size = 1, alpha=0.5) +
geom_line(aes(y = Value2), color = "red", size = 1, alpha=0.5) +
labs(subtitle="testplot",
x = "year",
y = "values") +
scale_x_date(date_labels = "%y", date_breaks ="1 year")

ggplot time series plotting: group by dates

Since the data file has different days for each group's time, one way to get all the groups onto the same plot is to just create a new variable, giving all groups the same "dummy" date but using the actual times collected.

experiment <- data.frame(Time, Value, Group)  #creates a data frame
experiment$hms <- as.POSIXct(paste("2015-01-01", substr(experiment$Time, 12, 19))) # pastes dummy date 2015-01-01 onto the HMS of Time

Now that you have the times with all the same date, you then can plot them easily.

experiment$Grouping <- as.factor(experiment$Group)  # gglot needed Group to be a factor, to give the lines color according to Group
ggplot(experiment, aes(x=hms, y=Value, color=Grouping)) + geom_line(size=2)

Below is the resulting image (you can change/modify the basic plot as you see fit):
Sample Image



Related Topics



Leave a reply



Submit