Override Column Types When Importing Data Using Readr::Read_Csv() When There Are Many Columns

Override column types when importing data using readr::read_csv() when there are many columns

Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.

It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.

E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D"))

or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))

The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").

pass text as specified columns to open as [type] using read_csv from readr in R

You can make a helper function with combines your extra spec with the default column spec, then pulling the spec together with do.call.

extra_spec = list(
"c_text" = "c",
"d_decimals" = "i",
"e_more_text" = "c"
)

read_csv_with_default_int = function(path, extra_spec) {
readr::read_csv(path, col_types = do.call(cols, c(extra_spec, list(.default = col_integer()))))
}

read_csv_with_default_int("file.csv", extra_spec = extra_spec)

You could also avoid the lots of nested logic with a helper like

cols_default_int = purrr::partial(cols, .default = col_integer())

read_csv_with_default_int = function(path, col_types) {
readr::read_csv(path, col_types = do.call(cols_default_int, col_types))
}

read_csv_with_default_int("file.csv", col_types = extra_spec)

How to specify column types with abbreviations when skipping columns with read_csv

Sorry, I've rewritten what I wrote earlier to be more clear based on an assumed understanding of what you are asking.

If you want to get the col_types for the columns in your csv file prior to any skipping or manual changes then the easiest thing to do is to use the spec_csv() argument of your file which generate a col class text that will show you how read_csv() will classify each column type.

From there you can copy, paste and edit that into your col_types argument to only bring in the columns & column types that you want. That can be done using the cols_only() argument instead of cols().

spec_csv("test.csv")

This will automatically generate in your output console:

cols(
a = col_double(),
b = col_double(),
c = col_double()
)

The output will tell you what the default reader column types would be (PS you can manipulate the spec_csv() argument just like the read_csv argument to increase the guess size eg.guess_max for the column types.

#manually copied and pasted the above output, changed the default to the desired type and deleted the columns I didn't want

read_csv("test.csv",
col_types=cols_only(a = col_character(),
c = col_character())
)

I used the long form (col_character) but you can instead you the abbreviation as you already indicated earlier.

Please let me know if this is what you were asking or if there is any clarity that I can provide.

Unusual error using read_csv and col_select and id argument in R

I have submitted the below as an issue on readr and it has been labeled as a bug.

https://github.com/tidyverse/readr/issues/1395



Related Topics



Leave a reply



Submit