using colClasses in fread
UPDATE: This is now implemented in v1.8.11 on R-Forge as of commit 966. From NEWS :
fread
'sdrop
,select
andNULL
incolClasses
are implemented. To
drop or select columns by name or by number. See examples in?fread
.
The examples in ?fread
are :
data = "A,B,C,D\n1,3,5,7\n2,4,6,8\n"
# colClasses
fread(data, colClasses=c(B="character",C="character",D="character"))
fread(data, colClasses=list(character=c("B","C","D"))) # saves typing
fread(data, colClasses=list(character=2:4)) # same using column numbers
# drop
fread(data, colClasses=c("B"="NULL","C"="NULL")) # as read.csv
fread(data, colClasses=list(NULL=c("B","C"))) # same
fread(data, drop=c("B","C")) # same but less typing, easier to read
fread(data, drop=2:3) # same using column numbers
# select
# (in read.csv you need to work out which to drop)
fread(data, select=c("A","D")) # less typing, easier to read
fread(data, select=c(1,4)) # same using column numbers
Using colClasses and select arguments of fread simultaneously
Actually found a solution in a more careful reading of this illustration of the drop
/select
/colClasses
options by Mr. Dowle:
DT <- fread("data.txt", select = c("V1", "V2", "V3"),
colClasses = list(character = c("char_names"),
factor = c("factor_names"),
numeric = c("numeric_names")))
I didn't realize this before because there were some other problems with my fread
attempts due to bad formatting of my .csv file.
Still, I am wont to call it a bug that the natural approach doesn't work:
DT <- fread("data.txt", select = c("V1", ..., "Vn"),
colClasses = c("type1", ..., "typen"))
fread - Specify data type of one specific column
As Roland pointed out in a comment, we can use the argument colClasses
"with a named vector specifying types for a subset of the columns by name".
Hence, in the above mini example we can do somthing like:
df = fread(file="path/to/my_file.csv", colClasses = c('id'='character'))
How do I use col.names and colClasses together in `data.table::fread`?
One possible solution is to use the index of the column and not the name.
data.table::fread("cars.csv", col.names = c("a","b"), colClasses = list(numeric = 1))
How to use colClass in R for Columns matching specific name in fread()
You could read just the header first, to find which columns have oid
in their name. Then set classes accordingly:
x = fread('
A B C oid.a oid.b D E
1 2 3 NA NA 6 7',
nrows = 0)
colNames = grep('^oid', names(x), value = TRUE)
colClasses = rep('numeric', length(colNames))
names(colClasses) = colNames
x = fread('
A B C oid.a oid.b D E
1 2 3 NA NA 6 7',
colClasses = colClasses)
str(x)
# Classes ‘data.table’ and 'data.frame': 1 obs. of 7 variables:
# $ A : int 1
# $ B : int 2
# $ C : int 3
# $ oid.a: num NA
# $ oid.b: num NA
# $ D : int 6
# $ E : int 7
An alternative of course would be to recast the columns after reading them, but on big data setting the class correctly first as shown would probably be preferable
R data.table fread: specify column data type
You can use the colClasses =
argument.
fread("mytable.csv", colClasses = c("character", "character", "numeric"))
Related Topics
Automatic Documentation of Datasets
Creating Professional Looking Powerpoints in R
Rank Variable by Group (Dplyr)
Calculating the Difference Between Consecutive Rows by Group Using Dplyr
Plotting a Large Number of Custom Functions in Ggplot in R Using Stat_Function()
Ggplot2 Equivalent of Matplot():Plot a Matrix/Array by Columns
Is Data Really Copied Four Times in R's Replacement Functions
Digging into R Profiling Information
R - How to Find Points Within Specific Contour
Fast Replacing Values in Dataframe in R
How to Host a Shiny App on a Windows MAChine
R Ggplot Barplot; Fill Based on Two Separate Variables
How to Write a Function That Calls a Function That Calls Data.Table
Error with Ggplot2 Mapping Variable to Y and Using Stat="Bin"
Apply Over Matrix by Column - Any Way to Get Column Name