Data Difference in 'As.Posixct' with Excel

data difference in `as.POSIXct` with Excel

The problem is that either R of Excel is rounding the number to two decimals. When you convert the for example the cell with 8/8/2013 15:10 to text formatting (in Excel on Mac OSX), you get the number 41494.63194.

When you use:

as.POSIXct(41494.63194*86400, origin="1899-12-30",tz="GMT")

it will give you:

[1] "2013-08-08 15:09:59 GMT"

This is 1 second off from the original date (which is also an indication that 41494.63194 is rounded to five decimals).

Probably the best solution to do is export your excel-file to a .csv or a tab-separated .txt file and then read it into R. This gives me at least the correct dates:

> df
datum
1 8/8/2013 15:10
2 7/26/2013 10:30
3 7/11/2013 14:20
4 3/28/2013 16:15
5 3/18/2013 15:50

excel date time conversion to POSXIct

One needs to know the key differences between the two time systems here.

In Excel 43306.29 means 43306 days from Jan 1, 1900 (day 1) and the 0.29 is the fraction of the day (about 7 hours here).

R uses Unix time keeping standard so it tracks the number of seconds from the Jan 1, 1970 (the beginning of time for a Unix programmer).

So in order to convert from Excel to R, you need to covert the number of days from the origin to the number of seconds (60 sec * 60 min * 24 hours).

as.POSIXct(43306.29*3600*24 , origin="1899-12-30")
#"2018-07-25 02:57:36 EDT"

as.POSIXct(43306.29*3600*24 , origin="1899-12-30", tz="UTC")
#"2018-07-25 06:57:36 UTC"

Note: Windows and Excel assumes there was a leap year in 1900 which there wasn't so the origin needs a correction to Dec 30, 1899.

Interconverting POSIXct and numeric in R

I found an answer based on chris holbrook's answer here: How do you convert dates/times from one time zone to another in R?

This worked:

 mydata$DateTime2 <- as.POSIXct(as.Date(mydata$DateTime1) +
mydata$Time2)
attributes(mydata$DateTime2)$tzone <- "GMT"

@MichaelChirico and I were correct that the time zone was the problem. I'm still not sure why, but the time zone for DateTime2 was apparently PST. It didn't list "PST" when I checked str(mydata$DateTime2), but based on the time difference, it must have been, in fact, PST until I set the attributes. Crazy. It did that even though DateTime1 was GMT.

Problems using POSIXct with and in R

You probably need to get everything in the same timezone first. Try

as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC", tz="UTC"))
#[1] 1419667800
# equivalent to "2014-12-27 08:10:00 UTC"

vs.

as.numeric(as.POSIXct("2014-12-27 08:10:00 UTC"))
#[1] 1419631800
# equivalent to 8:10 in local timezone - in my case Aust. EST.
# "2014-12-27 08:10:00 AEST"

You can see that they are actually numerically different.

To fix this, specify the tz= explicitly when importing as the "UTC" in your text strings will not be detected on input.

Timestamp changes when writing a R dataframe to an excel file depending upon UTC offset

The data in the R data frame is with the timezone CEST. When writing to excel, excel automatically changes the timezone to GMT which causes the time shift in excel.
One workaround is by changing the timezone in R to GMT without changing the time data, by using force_tz.

df$ts = as.POSIXct(strptime(df$ts, "%d.%m.%Y %H:%M:%S"))
Sys.setenv(TZ="")
df$ts = force_tz(df$ts,tzone="GMT")
write.xlsx(df, "output.xlsx", sheetName="output")

R as.POSIXct try two input formats

In what follows I will use package lubridate.

I have added two extra rows to the example dataset, with date/time values in the "%m/%d/%Y %H:%M" format. Note that that column is of class character, if it is of class factor it will probably throw an error.

As for the warnings, don't worry, they are just lubridate telling you that it found several formats and cannot process them all in one go.

tmp <- data$GMT_DateTime    # work on a copy

na <- is.na(ymd_hms(tmp))
data$GMT_DateTime[!na] <- ymd_hms(tmp)[!na]
data$GMT_DateTime[na] <- mdy_hm(tmp)[na]
data$GMT_DateTime <- as.POSIXct(as.numeric(data$GMT_DateTime),
format = "%Y-%m-%d",
origin = "1970-01-01", tz = "GMT")

rm(tmp) # final clean up

Data in dput() format.

data <-
structure(list(GMT_DateTime = c("2016/07/19 17:52:00", "2016/07/19 17:54:00",
"2016/07/19 17:55:00", "2016/07/19 17:56:00", "07/22/2016 17:02",
"07/23/2016 17:15"), northing = c(3674.64416424279, 3674.65121597935,
3674.65474186293, 3674.65826775671, 3674.662, 3674.665), easting = c(354.266660979476,
354.246972537617, 354.237128326737, 354.227284122559, 354.2702,
354.3123)), row.names = c(NA, -6L), class = "data.frame")

as.POSIXct does not recognise date format = %Y-%W

This is a variant on a quasi-FAQ on 'by can one not parse year and month as a date': because a date is day and month and year.

Or year, a year and week and day. Otherwise you are indeterminate:

> as.Date(format(Sys.Date(), "%Y-%W-%d"), "%Y-%W-%d")
[1] "2017-12-04"
>

using

> Sys.Date()
[1] "2017-12-04"
> format(Sys.Date(), "%Y-%W-%d")
[1] "2017-49-04"
>

so %W works on input and output provided you also supply a day.

For input data that does not have a day, you could just add a given weekday, say 1:

> as.Date(paste0(c("2016-46", "2016-47", "2016-48"), "-1"), "%Y-%W-%w")
[1] "2016-11-14" "2016-11-21" "2016-11-28"
>


Related Topics



Leave a reply



Submit