Problems with dplyr and POSIXlt data
You could use as.POSIXct
as recommended in the comments but if the hours, minutes, and seconds don't matter then you should just use as.Date
df <- read.csv("007.csv", header=T, sep=";")
df2 <- df %>%
mutate(
transaction_date = as.Date(transaction_date, "%d.%m.%Y")
,install_date = as.Date(install_date, "%d.%m.%Y")
) %>%
group_by(days = transaction_date - install_date) %>%
summarise(sum=sum(value))
dplyr does not group data by date
The lubridate package is useful when dealing with dates.
Here is the code to parse Start.Date and End.Date, extract week days, then group by week days:
Read dates as character vectors
library(dplyr)
library(lubridate)
# For some reason your instruction to load the csv directly from a url
# didn't work. I save the csv to a temporary directory.
d <- read.csv("/tmp/bike_trip_data.csv", colClasses = c("numeric", "numeric", "character", "factor", "numeric", "character", "factor", "numeric", "numeric", "factor", "character"), stringsAsFactors = T)
names(d)[9] <- "BikeNo"
d <- tbl_df(d)
Use lubridate to convert start date and end date
d <- d %>%
mutate(
Start.Date = parse_date_time(Start.Date,"%m/%d/%y %H:%M"),
End.Date = parse_date_time(End.Date,"%m/%d/%y %H:%M"),
Weekday = wday(Start.Date, label=TRUE, abbr=FALSE))
Number of lines per week day
d %>%
group_by(Weekday) %>%
summarise(Total = n())
# Weekday Total
# 1 Sunday 10587
# 2 Monday 23138
# 3 Tuesday 24678
# 4 Wednesday 23651
# 5 Thursday 25265
# 6 Friday 24283
# 7 Saturday 12413
dplyr - mutate_each - colswise coercion to POSIXlt fails
Revisiting my question about 4 years later, I realised that I forgot to mark it as answered. However, this also gives me the chance to document how this (relatively) simple type coercion can (meanwhile) elegantly solved with dplyr
and lubridate
.
Key lesson learned:
- never use POSIXlt with a data frame (and its later brother tibble,
although you can now work with list columns). - coerce date-timestamps with the helpful parser functions from the
lubridate
package.
For the example from above
ICAO_ADEP <- c("DGAA","ZSPD","UAAA","RJTT","KJFK","WSSS")
MVT_TIME_UTC <- c("01-Jan-2013 04:02:24", NA,"01-Jan-2013 04:08:18", NA,"01-Jan-2013 04:17:11","01-Jan-2013 04:21:52")
flights <- data.frame(ICAO_ADEP, MVT_TIME_UTC)
flights <- flights %>% mutate(MVT_TIME_UTC = lubridate::dmy_hms(MVT_TIME_UTC)
will coerce the timestamps in MVT_TIME_UTC. Check the documentation on lubridate for other parsers and/or how to handle local time zones.
dplyr::if_else changes datetime (POSIXct) values
At First, you need to know that parse_datetime()
returns a date-time object with an tzone
attribute default to UTC
. You can use lubridate::tz(x$A)
and attributes(x$A)
to check it.
From the document of if_else()
, it said the true
and false
arguments must be the same type. All other attributes are taken from true
. Hence, in part C
of your tibble:
C = if_else(FALSE, as.POSIXct(NA), A)
as.POSIXct(NA)
doesn't have a tzone
attribute, so A
's tzone
is dropped and reset to the time zone of your region. Actually, C
is not two hours later. The three columns have equal time but unequal time zones. To fix it, you can adjust as.POSIXct(NA)
to own a tzone
attribute, i.e. replace it with
as.POSIXct(NA_character_, tz = "UTC")
Note: You must use NA_character_
instead of NA
because the tz
argument in as.POSIXct()
only works on character objects.
Finally, revise your code as
x <- tibble(
A = parse_datetime("2020-08-18 19:00"),
B = if_else(TRUE, A, as.POSIXct(NA_character_, tz = "UTC")),
C = if_else(FALSE, as.POSIXct(NA_character_, tz = "UTC"), A)
)
# # A tibble: 1 x 3
# A B C
# <dttm> <dttm> <dttm>
# 1 2020-08-18 19:00:00 2020-08-18 19:00:00 2020-08-18 19:00:00
Remember to check their time zones.
R > lubridate::tz(x$A)
[1] "UTC"
R > lubridate::tz(x$B)
[1] "UTC"
R > lubridate::tz(x$C)
[1] "UTC"
how to groupby in dplyr with dataframe containing POSIXlt datatypes in r
We need to convert the column to POSIXct
as dplyr
doesn't support POSIXlt
vessel_data %>%
mutate(arrival_pilot_station = as.POSIXct(arrival_pilot_station)) %>%
group_by(Service) %>%
dplyr::summarise(average_time <- mean(diff_pilot_alongside)) %>%
as.data.frame()
Error creating R data.table with date-time POSIXlt
Formatting response from Blue Magister's comment (thanks so much), data.table does not support POSIXlt data types for performance reason -- see cast string to IDateTime as suggested as possible duplicate.
So the way to go is to cast time as ITime (type provided by data.table) or date-time (or date only) as POSIXct, depending upon whether date info is important or not:
> mdt <- data.table(id=1:3, d=as.ITime(strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S")))
> print(mdt)
id d
1: 1 06:02:36
2: 2 06:02:48
3: 3 07:03:12
> mdt <- data.table(id=1:3, d=as.POSIXct(strptime(c("06:02:36", "06:02:48", "07:03:12"), "%H:%M:%S")))
> print(mdt)
id d
1: 1 2014-01-31 06:02:36
2: 2 2014-01-31 06:02:48
3: 3 2014-01-31 07:03:12
As an extra note in case someone can benefit from it, I wanted to create date & time from my input data with date & time in separate fields.
I found it useful to learn (see ?ITime) that one can add time ITime to date-time POSIXct and get a date-time POSIXct as follows:
> mdt <- as.POSIXct("2014-01-31") + as.ITime("06:02:36")
> print(mdt)
[1] "2014-01-31 06:02:36 EST"
> class(mdt)
[1] "POSIXct" "POSIXt"
Related Topics
Linear Model with 'Lm': How to Get Prediction Variance of Sum of Predicted Values
Simple R 3D Interpolation/Surface Plot
Using Pivot_Longer with Multiple Paired Columns in the Wide Dataset
Create Link to the Other Part of the Shiny App
Ggplot Geom_Bar: Stack and Center
Combine Multiple .Rdata Files Containing Objects with the Same Name into One Single .Rdata File
How to Perform a Pairwise T.Test in R Across Multiple Independent Vectors
Calculate Summary Statistics (E.G. Mean) on All Numeric Columns Using Data.Table
Extracting Zip+CSV File from Attachment W/ Image in Body of Email
Remove Duplicate Rows from Xts Object
Wavelet Reconstruction of Time Series
R Error: Unknown Timezone with As.Posixct()
Generate a Sequence of Numbers with Repeated Intervals
How to Compute Weighted Mean in R
Load a Dataset into R with Data() Using a Variable Instead of the Dataset Name