Override column types when importing data using readr::read_csv() when there are many columns
Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.
It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.
E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:
read_csv(df, col_types = cols(.default = "d", date = "D"))
or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:
read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))
The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").
pass text as specified columns to open as [type] using read_csv from readr in R
You can make a helper function with combines your extra spec with the default column spec, then pulling the spec together with do.call
.
extra_spec = list(
"c_text" = "c",
"d_decimals" = "i",
"e_more_text" = "c"
)
read_csv_with_default_int = function(path, extra_spec) {
readr::read_csv(path, col_types = do.call(cols, c(extra_spec, list(.default = col_integer()))))
}
read_csv_with_default_int("file.csv", extra_spec = extra_spec)
You could also avoid the lots of nested logic with a helper like
cols_default_int = purrr::partial(cols, .default = col_integer())
read_csv_with_default_int = function(path, col_types) {
readr::read_csv(path, col_types = do.call(cols_default_int, col_types))
}
read_csv_with_default_int("file.csv", col_types = extra_spec)
How to specify column types with abbreviations when skipping columns with read_csv
Sorry, I've rewritten what I wrote earlier to be more clear based on an assumed understanding of what you are asking.
If you want to get the col_types for the columns in your csv file prior to any skipping or manual changes then the easiest thing to do is to use the spec_csv()
argument of your file which generate a col class text that will show you how read_csv()
will classify each column type.
From there you can copy, paste and edit that into your col_types
argument to only bring in the columns & column types that you want. That can be done using the cols_only()
argument instead of cols()
.
spec_csv("test.csv")
This will automatically generate in your output console:
cols(
a = col_double(),
b = col_double(),
c = col_double()
)
The output will tell you what the default reader column types would be (PS you can manipulate the spec_csv()
argument just like the read_csv
argument to increase the guess size eg.guess_max
for the column types.
#manually copied and pasted the above output, changed the default to the desired type and deleted the columns I didn't want
read_csv("test.csv",
col_types=cols_only(a = col_character(),
c = col_character())
)
I used the long form (col_character) but you can instead you the abbreviation as you already indicated earlier.
Please let me know if this is what you were asking or if there is any clarity that I can provide.
Unusual error using read_csv and col_select and id argument in R
I have submitted the below as an issue on readr and it has been labeled as a bug.
https://github.com/tidyverse/readr/issues/1395
Related Topics
Calculating the Difference Between Consecutive Rows by Group Using Dplyr
How to Write a Function That Calls a Function That Calls Data.Table
Dynamic Position for Ggplot2 Objects (Especially Geom_Text)
Plotting Envfit Vectors (Vegan Package) in Ggplot2
R Scoping: Disallow Global Variables in Function
How to Plot Logit and Probit in Ggplot2
Differencebetween These Two Comparisons
Email Dataframe as Table in Email Body with Sendmailr
Using Geo-Coordinates as Vertex Coordinates in the Igraph R-Package
Using a Loop to Create Multiple Data Frames in R
R Programming: How to Get Euler's Number
How to Leave the R Browser() Mode in the Console Window
Adding Time to Posixct Object in R
Checking Cran Incoming Feasibility ... Note Maintainer
Forcing R Output to Be Scientific Notation with at Most Two Decimals
R: Numeric 'Envir' Arg Not of Length One in Predict()
How to Suppress Row Names When Using Dt::Renderdatatable in R Shiny