Using Colclasses in Fread

using colClasses in fread

UPDATE: This is now implemented in v1.8.11 on R-Forge as of commit 966. From NEWS :

fread's drop, select and NULL in colClasses are implemented. To
drop or select columns by name or by number. See examples in ?fread.

The examples in ?fread are :

data = "A,B,C,D\n1,3,5,7\n2,4,6,8\n"

# colClasses
fread(data, colClasses=c(B="character",C="character",D="character"))
fread(data, colClasses=list(character=c("B","C","D"))) # saves typing
fread(data, colClasses=list(character=2:4)) # same using column numbers

# drop
fread(data, colClasses=c("B"="NULL","C"="NULL")) # as read.csv
fread(data, colClasses=list(NULL=c("B","C"))) # same
fread(data, drop=c("B","C")) # same but less typing, easier to read
fread(data, drop=2:3) # same using column numbers

# select
# (in read.csv you need to work out which to drop)
fread(data, select=c("A","D")) # less typing, easier to read
fread(data, select=c(1,4)) # same using column numbers

Using colClasses and select arguments of fread simultaneously

Actually found a solution in a more careful reading of this illustration of the drop/select/colClasses options by Mr. Dowle:

DT <- fread("data.txt", select = c("V1", "V2", "V3"),
colClasses = list(character = c("char_names"),
factor = c("factor_names"),
numeric = c("numeric_names")))

I didn't realize this before because there were some other problems with my fread attempts due to bad formatting of my .csv file.

Still, I am wont to call it a bug that the natural approach doesn't work:

DT <- fread("data.txt", select = c("V1", ..., "Vn"),
colClasses = c("type1", ..., "typen"))

fread - Specify data type of one specific column

As Roland pointed out in a comment, we can use the argument colClasses "with a named vector specifying types for a subset of the columns by name".

Hence, in the above mini example we can do somthing like:

df = fread(file="path/to/my_file.csv", colClasses = c('id'='character'))

How do I use col.names and colClasses together in `data.table::fread`?

One possible solution is to use the index of the column and not the name.

data.table::fread("cars.csv", col.names = c("a","b"), colClasses = list(numeric = 1))

How to use colClass in R for Columns matching specific name in fread()

You could read just the header first, to find which columns have oid in their name. Then set classes accordingly:

x = fread('
A B C oid.a oid.b D E
1 2 3 NA NA 6 7',
nrows = 0)

colNames = grep('^oid', names(x), value = TRUE)
colClasses = rep('numeric', length(colNames))
names(colClasses) = colNames
x = fread('
A B C oid.a oid.b D E
1 2 3 NA NA 6 7',
colClasses = colClasses)
str(x)
# Classes ‘data.table’ and 'data.frame': 1 obs. of 7 variables:
# $ A : int 1
# $ B : int 2
# $ C : int 3
# $ oid.a: num NA
# $ oid.b: num NA
# $ D : int 6
# $ E : int 7

An alternative of course would be to recast the columns after reading them, but on big data setting the class correctly first as shown would probably be preferable

R data.table fread: specify column data type

You can use the colClasses = argument.

fread("mytable.csv", colClasses = c("character", "character", "numeric"))



Related Topics



Leave a reply



Submit