Specify Custom Date Format For Colclasses Argument in Read.Table/Read.Csv

Specify custom Date format for colClasses argument in read.table/read.csv

You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs to set it as an as method. Then you can use your function as part of the colClasses.

Try:

setAs("character","myDate", function(from) as.Date(from, format="%d/%m/%Y") )

tmp <- c("1, 15/08/2008", "2, 23/05/2010")
con <- textConnection(tmp)

tmp2 <- read.csv(con, colClasses=c('numeric','myDate'), header=FALSE)
str(tmp2)

Then modify if needed to work for your data.

Edit ---

You might want to run setClass('myDate') first to avoid the warning (you can ignore the warning, but it can get annoying if you do this a lot and this is a simple call that gets rid of it).

colClasses date and time read.csv

You can't do this upon reading the data in to R using the colClasses argument because the data span two "columns" in the CSV file. Instead, load the data and process the date and time columns into a single POSIXlt variable:

dat <- read.csv(textConnection("date,time,val1,val2
20090503,0:05:12,107.25,1
20090503,0:05:17,108.25,20
20090503,0:07:45,110.25,5
20090503,0:07:56,106.25,5"))
dat <- within(dat, Datetime <- as.POSIXlt(paste(date, time),
format = "%Y%m%d %H:%M:%S"))

[I presume it is year month day??, If not use "%Y%d%m %H:%M:%S"]

Which gives:

> head(dat)
date time val1 val2 Datetime
1 20090503 0:05:12 107.25 1 2009-05-03 00:05:12
2 20090503 0:05:17 108.25 20 2009-05-03 00:05:17
3 20090503 0:07:45 110.25 5 2009-05-03 00:07:45
4 20090503 0:07:56 106.25 5 2009-05-03 00:07:56
> str(dat)
'data.frame': 4 obs. of 5 variables:
$ date : int 20090503 20090503 20090503 20090503
$ time : Factor w/ 4 levels "0:05:12","0:05:17",..: 1 2 3 4
$ val1 : num 107 108 110 106
$ val2 : int 1 20 5 5
$ Datetime: POSIXlt, format: "2009-05-03 00:05:12" "2009-05-03 00:05:17" ...

You can now delete date and `time if you wish:

> dat <- dat[, -(1:2)]
> head(dat)
val1 val2 Datetime
1 107.25 1 2009-05-03 00:05:12
2 108.25 20 2009-05-03 00:05:17
3 110.25 5 2009-05-03 00:07:45
4 106.25 5 2009-05-03 00:07:56

effect of colClasses in read.csv function

Factors (the data type R uses to store categorical variables) carry their possible levels along with them, and these are printed by default. There are a variety of solutions:

  • use colClasses when reading in the data as you suggested;
  • use stringsAsFactors=FALSE
  • read the file as usual, then use print(as.character(z1[1]))
  • use print(z1[1],max.levels=0)

Read csv with timestamp to R. Define colClass in table.read

For an unconventional date-time format, one can import as character (step 1) and then coerce the column via strp (step 2)

step 1

df <- read.table(file = "data.csv",
header = TRUE,
sep = "," ,
dec = "." ,
colClasses = "character",
comment.char = ""
)

step 2

strptime(df$v1, "%m/%d/%y  %H:%M")

v1 being the name of the column to coerce (in this case date-time in the unconventional format 12/13/2014 15:16:17)

Notes
Using argument sep is necessary since read.table default for sep = "".

When using read.csv there is no need to use the sep argument, which defaults to ",".

Using comment.char = "" (when possible) improves reading time.

Useful info at http://cran.r-project.org/doc/manuals/r-release/R-data.pdf

read.csv2 date formatting in R

When you convert character to date, you need specify format if it is not standard. The error you got is the result of as.Date("08.09.2016"). But if you do as.Date("08.09.2016", format = "%m.%d.%Y"), it is fine.

I am not sure whether it is possible to pass format to read.csv2 for correct date formatting (maybe not). I would simply read in this date column as factor, then do as.Date(as.character(), format = "%m.%d.%Y") on this column myself.

Generally we use the following format "dd/mm/yy" how can I reorganise the date to that format?

Use format(, format = "%d/%m/%y").


A complete example:

format(as.Date("08.09.2016", format = "%m.%d.%Y"), format = "%d/%m/%y")
# [1] "09/08/16"

read.csv() and colClasses

You really don't need the [1:24] in every assignment, this is what is causing your problems. You are assign to a subset of a indexed vector of some description.

The error message when are trying to assign to data[1:24], without data being assigned previously (in your previous usage (which you mentioned worked), data was probably a list or data.frame you had created.). As such data is a function (for loading data associated with packages, see ?data) and the error you saw is saying that (a function includes a closure)

I would suggest something like

Precipfiles <- list.files(pattern=".csv")
DFlist <- lapply(Precipfiles, read.table, sep = '\t',
na.string = '', header = TRUE)
bigDF <- do.call(rbind, DFlist)

# or, much faster using data.table
library(data.table)
bigDF <- rbindlist(DFlist)

Converting a date in R returns NA

 as.Date.character(gsub("/", "-",td3$date), '%m-%d-%Y')
[1] "2016-05-06" "2016-05-07" "2016-04-13" "2016-04-14"


Related Topics



Leave a reply



Submit