First day of the month from a POSIXct date time using lubridate
lubridate has a function called floor_date
which rounds date-times down. Calling it with unit = "month"
does exactly what you want:
library(lubridate)
full.date <- ymd_hms("2013-01-01 00:00:21")
floor_date(full.date, "month")
[1] "2013-01-01 UTC"
Given a date and get the beginning date of previous month using lubridate package
You could substract the month after floor
:
end_date <- '2021-10-31'
floor_date(as.Date(end_date), 'month') - months(1)
[1] "2021-09-01"
How to extract Month from date in R
?month
states:
Date-time must be a POSIXct, POSIXlt, Date, Period, chron, yearmon,
yearqtr, zoo, zooreg, timeDate, xts, its, ti, jul, timeSeries, and fts
objects.
Your object is a factor, not even a character vector (presumably because of stringsAsFactors = TRUE
). You have to convert your vector to some datetime class, for instance to POSIXlt
:
library(lubridate)
some_date <- c("01/02/1979", "03/04/1980")
month(as.POSIXlt(some_date, format="%d/%m/%Y"))
[1] 2 4
There's also a convenience function dmy
, that can do the same (tip proposed by @Henrik):
month(dmy(some_date))
[1] 2 4
Going even further, @IShouldBuyABoat gives another hint that dd/mm/yyyy character formats are accepted without any explicit casting:
month(some_date)
[1] 2 4
For a list of formats, see ?strptime
. You'll find that "standard unambiguous format" stands for
The default formats follow the rules of the ISO 8601 international
standard which expresses a day as "2001-02-28" and a time as
"14:01:02" using leading zeroes as here.
changing POSIXct date vaules to first day of each week
Using bosom buddy of plyr
,
library(lubridate)
library(dplyr)
df %>%
group_by(Week = floor_date(Date, unit="week")) %>%
summarize(WeeklyAveDist=mean(Dist))
#Source: local data frame [3 x 2]
#
# Week WeeklyAveDist
#1 2012-02-12 381.7755
#2 2012-02-19 252.1116
#3 2012-02-26 175.4097
There are also ceiling_date
, round_date
options.
Determine season from Date using lubridate in R
I packaged @Lars Arne Jordanger's much more elegant approach into a function:
getTwoSeasons <- function(input.date){
numeric.date <- 100*month(input.date)+day(input.date)
## input Seasons upper limits in the form MMDD in the "break =" option:
cuts <- base::cut(numeric.date, breaks = c(0,415,1015,1231))
# rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
levels(cuts) <- c("Winter", "Summer","Winter")
return(cuts)
}
Testing it on some sample data seems to work fine:
getTwoSeasons(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))
Trying to add more axis marks in Base R with date/time format using lubridate()
First, format date as.POSIXct
, this is important for which plot
method is called, apparently you already have done that.
dat <- transform(dat, date=as.POSIXct(date))
Then, subset on the substr
ings where hours are e.g. '00'
. Next plot without x-axis and build custom axis using axis
and mtext
.
st <- substr(dat$date, 12, 13) == '00'
plot(dat, type='b', col='blue', xaxt='n')
axis(1, dat$date[st], labels=F)
mtext(strftime(dat$date[st], '%b %d'), 1, 1, at=dat$date[st])
Data:
set.seed(42)
dat <- data.frame(
date=as.character(seq.POSIXt(as.POSIXct('2021-06-22'), as.POSIXct('2021-06-29'), 'hour')),
v=runif(169)
)
Create end of the month date from a date variable
To get the end of months you could just create a Date
vector containing the 1st of all the subsequent months and subtract 1 day.
date.end.month <- seq(as.Date("2012-02-01"),length=4,by="months")-1
date.end.month
[1] "2012-01-31" "2012-02-29" "2012-03-31" "2012-04-30"
Extract month and year from datetime in R
lubridate
month
and year
will work.
as.data.frame(Order.Date) %>%
mutate(Month = lubridate::month(Order.Date, label = FALSE),
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 10 2011
2 2011-12-25 12 2011
3 2012-04-15 4 2012
4 2012-08-23 8 2012
5 2013-09-25 9 2013
If you want month format as Jan
, use month.abb
and as January
, use month.name
as.data.frame(Order.Date) %>%
mutate(Month = month.abb[lubridate::month(Order.Date, label = TRUE)],
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 Oct 2011
2 2011-12-25 Dec 2011
3 2012-04-15 Apr 2012
4 2012-08-23 Aug 2012
5 2013-09-25 Sep 2013
as.data.frame(Order.Date) %>%
mutate(Month = month.name[lubridate::month(Order.Date, label = TRUE)],
Year = lubridate::year(Order.Date))
Order.Date Month Year
1 2011-10-20 October 2011
2 2011-12-25 December 2011
3 2012-04-15 April 2012
4 2012-08-23 August 2012
5 2013-09-25 September 2013
Transform string in date through Lubridate with variation in month, day, year hour min am/pm and time zone
First thing I did was to setup a tibble with your 2 date vectors
tibble(
date1 = c("February 11th 2017, 6:05am PST", "April 24th 2018, 4:09pm PDT"),
date2 = c("2013-12-14 00:58:00 CET", "2013-06-19 18:00:00 CEST"),
) %>%
{. ->> my_dates}
my_dates
# # A tibble: 2 x 2
# date1 date2
# <chr> <chr>
# February 11th 2017, 6:05am PST 2013-12-14 00:58:00 CET
# April 24th 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST
Then, make a tibble of the timezone abbreviations and their offset from UTC
# setup timezones and UTC offsets
tribble(
~tz, ~offset,
'PST', -8,
'PDT', -7,
'CET', +1,
'CEST', +2
) %>%
{. ->> my_tz}
my_tz
# # A tibble: 4 x 2
# tz offset
# <chr> <dbl>
# PST -8
# PDT -7
# CET 1
# CEST 2
Then, we tidy the datetimes up by removing the character suffix after the day number in date1
(the 'th' bit after '11th'). We also pull out the timezone code and put that in a separate column; the timezone column allows us to left_join()
my_tz
in, giving us the UTC offset.
We use string-handling functions from the stringr
package, and regex expressions to find, extract and replace the components. A very handy tool for testing regex patterns can be found here https://regex101.com/r/5pr3LL/1/
my_dates %>%
mutate(
# remove the character suffix after the day number (eg 11th)
day_suffix = str_extract(date1, '[0-9]+[a-z]+') %>% str_extract('[a-z]+'),
date1 = str_replace(date1, day_suffix, ''),
day_suffix = NULL,
# extract timezone info
date1_tz = str_extract(date1, '[a-zA-Z]+$'),
date2_tz = str_extract(date2, '[a-zA-Z]+$'),
) %>%
# join in timezones for date1
left_join(my_tz, by = c('date1_tz' = 'tz')) %>%
rename(
offset_date1 = offset
) %>%
# join in timezones for date2
left_join(my_tz, by = c('date2_tz' = 'tz')) %>%
rename(
offset_date2 = offset
) %>%
{. ->> my_dates_info}
my_dates_info
# # A tibble: 2 x 6
# date1 date2 date1_tz date2_tz offset_date1 offset_date2
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# February 11 2017, 6:05am PST 2013-12-14 00:58:00 CET PST CET -8 1
# April 24 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST PDT CEST -7 2
So now, we can use lubridate::as_datetime()
to convert date1
and date2
to dttm
(datetime) format. as_datetime()
takes a character-format datetime and converts it to datetime format. You must specify the format of the character string using symbols and abbreviations explained here. For example, here we use %B
to refer to the full name of the month, %d
is the day number and %Y
is the (4-digit) year number etc.
Note: because we don't specify the timezone inside as_datetime()
, the underlying timezone stored with these datetimes defaults to UTC
(as seen by using tz()
). This is why we call these columns date*_orig
, to remind us the timezone is the original datetime's timezone. Then we add the offset to the datetime object, so we now have these times in UTC (and the underlying timezone signature of these values is UTC
, so that's ideal).
# now define datetimes in local and UTC timezones (note: technically the tz is UTC for both)
my_dates_info %>%
mutate(
date1_orig = as_datetime(date1, format = '%B %d %Y, %I:%M%p '),
date1_utc = date1_orig + hours(offset_date1),
date2_orig = as_datetime(date2, format = '%Y-%m-%d %H:%M:%S'),
date2_utc = date2_orig + hours(offset_date2),
) %>%
{. ->> my_dates_utc}
my_dates_utc
# # A tibble: 2 x 10
# date1 date2 date1_tz date2_tz offset_date1 offset_date2 date1_orig date1_utc date2_orig date2_utc
# <chr> <chr> <chr> <chr> <dbl> <dbl> <dttm> <dttm> <dttm> <dttm>
# February 11 2017, 6:05am PST 2013-12-14 00:58:00 CET PST CET -8 1 2017-02-11 06:05:00 2017-02-10 22:05:00 2013-12-14 00:58:00 2013-12-14 01:58:00
# April 24 2018, 4:09pm PDT 2013-06-19 18:00:00 CEST PDT CEST -7 2 2018-04-24 16:09:00 2018-04-24 09:09:00 2013-06-19 18:00:00 2013-06-19 20:00:00
Now that we have both sets of dates in datetime format, and in the same timezone, we can calculate time differences between them.
# now calculate difference between them
my_dates_utc %>%
select(date1_utc, date2_utc) %>%
mutate(
difference_days = interval(start = date1_utc, end = date2_utc) %>% time_length(unit = 'days')
)
# # A tibble: 2 x 3
# date1_utc date2_utc difference_days
# <dttm> <dttm> <dbl>
# 2017-02-10 22:05:00 2013-12-14 01:58:00 -1155.
# 2018-04-24 09:09:00 2013-06-19 20:00:00 -1770.
This should be fine for small-scale operations. If you had more than 2 different datetime format vectors, it would be worth considering a more complex operation where you transform the data from wide to long format. This would save repeating the same/similar code for each column, like we have done for date1
and date2
in this example.
Related Topics
Partially Color Histogram in R
Automate Zip File Reading in R
How to Create a Bar Plot for Two Variables Mirrored Across the X-Axis in R
How to Return 5 Topmost Values from Vector in R
Represent Numeric Value with Typical Dollar Amount Format
Dplyr Summarize with Subtotals
Remove Text After Final Period in String
Get Margin Line Locations in Log Space
Twitter Data Analysis - Error in Term Document Matrix
Substitute Dt1.X with Dt2.Y When Dt1.X and Dt2.X Match in R
How to Plot a Heat Map on a Spatial Map
R 3.3.0 Installing a Package on Windows: Gcc Not Found Error
Rounding Time to Nearest Quarter Hour
Get the Index of the Values of One Vector in Another
Assign Names to Data Frame with As.Data.Frame Function