Problems with Dplyr and Posixlt Data

Problems with dplyr and POSIXlt data

You could use as.POSIXct as recommended in the comments but if the hours, minutes, and seconds don't matter then you should just use as.Date

df <- read.csv("007.csv", header=T, sep=";")

df2 <- df %>%
mutate(
transaction_date = as.Date(transaction_date, "%d.%m.%Y")
,install_date = as.Date(install_date, "%d.%m.%Y")
) %>%
group_by(days = transaction_date - install_date) %>%
summarise(sum=sum(value))

dplyr does not group data by date

The lubridate package is useful when dealing with dates.
Here is the code to parse Start.Date and End.Date, extract week days, then group by week days:

Read dates as character vectors

library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)

names(d)[9] <- "BikeNo"
d <- tbl_df(d)

Use lubridate to convert start date and end date

d <- d %>% 
mutate(
Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))

Number of lines per week day

d %>%
group_by(Weekday) %>%
summarise(Total = n())

# Weekday Total
# 1 Sunday 10587
# 2 Monday 23138
# 3 Tuesday 24678
# 4 Wednesday 23651
# 5 Thursday 25265
# 6 Friday 24283
# 7 Saturday 12413

dplyr - mutate_each - colswise coercion to POSIXlt fails

Revisiting my question about 4 years later, I realised that I forgot to mark it as answered. However, this also gives me the chance to document how this (relatively) simple type coercion can (meanwhile) elegantly solved with dplyr and lubridate.

Key lesson learned:

  1. never use POSIXlt with a data frame (and its later brother tibble,
    although you can now work with list columns).
  2. coerce date-timestamps with the helpful parser functions from the lubridate package.

For the example from above

ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013 04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)

flights <- flights %>% mutate(MVT_TIME_UTC = lubridate::dmy_hms(MVT_TIME_UTC)

will coerce the timestamps in MVT_TIME_UTC. Check the documentation on lubridate for other parsers and/or how to handle local time zones.

dplyr::if_else changes datetime (POSIXct) values

At First, you need to know that parse_datetime() returns a date-time object with an tzone attribute default to UTC. You can use lubridate::tz(x$A) and attributes(x$A) to check it.

From the document of if_else(), it said the true and false arguments must be the same type. All other attributes are taken from true. Hence, in part C of your tibble:

C = if_else(FALSE, as.POSIXct(NA), A)

as.POSIXct(NA) doesn't have a tzone attribute, so A's tzone is dropped and reset to the time zone of your region. Actually, C is not two hours later. The three columns have equal time but unequal time zones. To fix it, you can adjust as.POSIXct(NA) to own a tzone attribute, i.e. replace it with

as.POSIXct(NA_character_, tz = "UTC")

Note: You must use NA_character_ instead of NA because the tz argument in as.POSIXct() only works on character objects.


Finally, revise your code as

x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, as.POSIXct(NA_character_, tz = "UTC")),
C = if_else(FALSE, as.POSIXct(NA_character_, tz = "UTC"), A)
)

# # A tibble: 1 x 3
# A B C
# <dttm> <dttm> <dttm>
# 1 2020-08-18 19:00:00 2020-08-18 19:00:00 2020-08-18 19:00:00

Remember to check their time zones.

R > lubridate::tz(x$A)
[1] "UTC"
R > lubridate::tz(x$B)
[1] "UTC"
R > lubridate::tz(x$C)
[1] "UTC"

how to groupby in dplyr with dataframe containing POSIXlt datatypes in r

We need to convert the column to POSIXct as dplyr doesn't support POSIXlt

vessel_data %>% 
mutate(arrival_pilot_station = as.POSIXct(arrival_pilot_station)) %>%
group_by(Service) %>%
dplyr::summarise(average_time <- mean(diff_pilot_alongside)) %>%
as.data.frame()

Error creating R data.table with date-time POSIXlt

Formatting response from Blue Magister's comment (thanks so much), data.table does not support POSIXlt data types for performance reason -- see cast string to IDateTime as suggested as possible duplicate.

So the way to go is to cast time as ITime (type provided by data.table) or date-time (or date only) as POSIXct, depending upon whether date info is important or not:

> mdt <- data.table(id=1:3, d=as.ITime(strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S")))
> print(mdt)
id d
1: 1 06:02:36
2: 2 06:02:48
3: 3 07:03:12
> mdt <- data.table(id=1:3, d=as.POSIXct(strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S")))
> print(mdt)
id d
1: 1 2014-01-31 06:02:36
2: 2 2014-01-31 06:02:48
3: 3 2014-01-31 07:03:12

As an extra note in case someone can benefit from it, I wanted to create date & time from my input data with date & time in separate fields.
I found it useful to learn (see ?ITime) that one can add time ITime to date-time POSIXct and get a date-time POSIXct as follows:

> mdt <- as.POSIXct("2014-01-31") + as.ITime("06:02:36")
> print(mdt)
[1] "2014-01-31 06:02:36 EST"
> class(mdt)
[1] "POSIXct" "POSIXt"


Related Topics



Leave a reply



Submit