Specifying Colclasses in the Read.Csv

Specifying colClasses in the read.csv

The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))

effect of colClasses in read.csv function

Factors (the data type R uses to store categorical variables) carry their possible levels along with them, and these are printed by default. There are a variety of solutions:

  • use colClasses when reading in the data as you suggested;
  • use stringsAsFactors=FALSE
  • read the file as usual, then use print(as.character(z1[1]))
  • use print(z1[1],max.levels=0)

Warning message in R when using colClasses when reading csv files

It's to let you know that you're just keeping one column of the data out of three because it doesn't know how to handle colClasses of "NULL". Note your NULL is in quotation marks.

An example:

write.csv(data.frame(fi=letters[1:3],
fy=rnorm(3,500,1),
fo=rnorm(3,50,2))
,file="a.csv",row.names = F)

write.csv(data.frame(fib=letters[2:4],
fyb=rnorm(3,5,1),
fob=rnorm(3,50,2))
,file="b.csv",row.names = F)

file_list=list("a.csv","b.csv")

lapply(file_list, read.csv,sep=',', header = F, col.names=F, nrow=1, colClasses = c('character', 'NULL', 'NULL'))

Which results in:

[[1]]
FALSE.
1 fi

[[2]]
FALSE.
1 fib

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 3

Which is the same as if you used:

lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c('character', 'asdasd', 'asdasd'))

But the warning goes away (and you get the rest of the row as a result) if you do:

lapply(file_list, read.csv,sep=',', header = F, col.names=F,
nrow=1, colClasses = c( 'character',NULL, NULL))

You can see where errors and warnings come from in source code for a function by entering, for example, read.table directly without anything following it, then searching for your particular warning within it.

Read all the columns of a dataframe as characters, by means of `read.csv`

From ?read.csv:

colClasses
character. A vector of classes to be assumed for the columns. If unnamed, recycled as necessary.

Recycled means that it will be applied to as many columns as you have. So there's no need to specify the number of columns - you can just use:

read.csv(colClasses = "character")


Related Topics



Leave a reply



Submit