Specify custom Date format for colClasses argument in read.table/read.csv
You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs
to set it as an as
method. Then you can use your function as part of the colClasses.
Try:
setAs("character","myDate", function(from) as.Date(from, format="%d/%m/%Y") )
tmp <- c("1, 15/08/2008", "2, 23/05/2010")
con <- textConnection(tmp)
tmp2 <- read.csv(con, colClasses=c('numeric','myDate'), header=FALSE)
str(tmp2)
Then modify if needed to work for your data.
Edit ---
You might want to run setClass('myDate')
first to avoid the warning (you can ignore the warning, but it can get annoying if you do this a lot and this is a simple call that gets rid of it).
colClasses date and time read.csv
You can't do this upon reading the data in to R using the colClasses
argument because the data span two "columns" in the CSV file. Instead, load the data and process the date
and time
columns into a single POSIXlt
variable:
dat <- read.csv(textConnection("date,time,val1,val2
20090503,0:05:12,107.25,1
20090503,0:05:17,108.25,20
20090503,0:07:45,110.25,5
20090503,0:07:56,106.25,5"))
dat <- within(dat, Datetime <- as.POSIXlt(paste(date, time),
format = "%Y%m%d %H:%M:%S"))
[I presume it is year month day??, If not use "%Y%d%m %H:%M:%S"
]
Which gives:
> head(dat)
date time val1 val2 Datetime
1 20090503 0:05:12 107.25 1 2009-05-03 00:05:12
2 20090503 0:05:17 108.25 20 2009-05-03 00:05:17
3 20090503 0:07:45 110.25 5 2009-05-03 00:07:45
4 20090503 0:07:56 106.25 5 2009-05-03 00:07:56
> str(dat)
'data.frame': 4 obs. of 5 variables:
$ date : int 20090503 20090503 20090503 20090503
$ time : Factor w/ 4 levels "0:05:12","0:05:17",..: 1 2 3 4
$ val1 : num 107 108 110 106
$ val2 : int 1 20 5 5
$ Datetime: POSIXlt, format: "2009-05-03 00:05:12" "2009-05-03 00:05:17" ...
You can now delete date
and `time if you wish:
> dat <- dat[, -(1:2)]
> head(dat)
val1 val2 Datetime
1 107.25 1 2009-05-03 00:05:12
2 108.25 20 2009-05-03 00:05:17
3 110.25 5 2009-05-03 00:07:45
4 106.25 5 2009-05-03 00:07:56
effect of colClasses in read.csv function
Factors (the data type R uses to store categorical variables) carry their possible levels along with them, and these are printed by default. There are a variety of solutions:
- use
colClasses
when reading in the data as you suggested; - use
stringsAsFactors=FALSE
- read the file as usual, then use
print(as.character(z1[1]))
- use
print(z1[1],max.levels=0)
Read csv with timestamp to R. Define colClass in table.read
For an unconventional date-time format, one can import as character (step 1) and then coerce the column via strp (step 2)
step 1
df <- read.table(file = "data.csv",
header = TRUE,
sep = "," ,
dec = "." ,
colClasses = "character",
comment.char = ""
)
step 2
strptime(df$v1, "%m/%d/%y %H:%M")
v1 being the name of the column to coerce (in this case date-time in the unconventional format 12/13/2014 15:16:17)
Notes
Using argument sep is necessary since read.table default for sep = "".
When using read.csv there is no need to use the sep
argument, which defaults to ",".
Using comment.char = "" (when possible) improves reading time.
Useful info at http://cran.r-project.org/doc/manuals/r-release/R-data.pdf
read.csv2 date formatting in R
When you convert character to date, you need specify format if it is not standard. The error you got is the result of as.Date("08.09.2016")
. But if you do as.Date("08.09.2016", format = "%m.%d.%Y")
, it is fine.
I am not sure whether it is possible to pass format to read.csv2
for correct date formatting (maybe not). I would simply read in this date column as factor, then do as.Date(as.character(), format = "%m.%d.%Y")
on this column myself.
Generally we use the following format "dd/mm/yy" how can I reorganise the date to that format?
Use format(, format = "%d/%m/%y")
.
A complete example:
format(as.Date("08.09.2016", format = "%m.%d.%Y"), format = "%d/%m/%y")
# [1] "09/08/16"
read.csv() and colClasses
You really don't need the [1:24]
in every assignment, this is what is causing your problems. You are assign to a subset of a indexed vector of some description.
The error message when are trying to assign to data[1:24]
, without data
being assigned previously (in your previous usage (which you mentioned worked), data
was probably a list
or data.frame
you had created.). As such data
is a function (for loading data associated with packages, see ?data
) and the error you saw is saying that (a function includes a closure)
I would suggest something like
Precipfiles <- list.files(pattern=".csv")
DFlist <- lapply(Precipfiles, read.table, sep = '\t',
na.string = '', header = TRUE)
bigDF <- do.call(rbind, DFlist)
# or, much faster using data.table
library(data.table)
bigDF <- rbindlist(DFlist)
Converting a date in R returns NA
as.Date.character(gsub("/", "-",td3$date), '%m-%d-%Y')
[1] "2016-05-06" "2016-05-07" "2016-04-13" "2016-04-14"
Related Topics
How to Force R to Use a Specified Factor Level as Reference in a Regression
Calculate Difference Between Values in Consecutive Rows by Group
How to Loop Through List and Create Separate Dataframes in R
Select the N Most Frequent Values in a Variable
Find All Combinations of a Set of Numbers That Add Up to a Certain Total
R: Error in Usemethod("Tbl_Vars")
How to Dplyr Rename a Column, by Column Index
Delete Rows Containing Specific Strings in R
Creating Grouped Bar-Plot of Multi-Column Data in R
How to Convert a Data Frame Column to Numeric Type
Break Dataframe into Smaller Dataframe'S and Save Them
How to Replace Negative Values in a Dataframe Column With a Different Value
Cannot Install R-Forge Package Using Install.Packages
How to Count the Number of Unique Values by Group
Removing Duplicate Combinations (Irrespective of Order)
Split String Column to Create New Binary Columns