How to load comma separated data into R?
If you look at the help on read.table
you'll discover some extra functions that are essentially read.table
with different defaults. If you tend to read in lots of files that would be best read in using those defaults then use them instead of read.table
for conciseness.
This code will read in your file
x <- read.table("C:\\flatFile.txt", header=TRUE, sep = ',')
or this code
x <- read.csv("C:\\flatFile.txt")
Note that, while you can set any of the features of these read.table
based commands just like read.table
, it is rather pointless to use them and reiterate the default settings. For example, don't bother with read.csv
if you're then going to also be setting header = TRUE
, and/or, sep = ','
all of the time as well. You might as well just use read.table
in that case.
Read files that are pipe AND comma delimited: |column1|,|column2|
read.table("./temp.csv", sep=",", quote = "|")
will do the trick...
reading comma-separated strings with read.csv()
1) read.pattern read.pattern
(in gsubfn package) can read such files:
library(gsubfn)
pat <- "(.*),(.*)"
read.pattern("test.csv", pattern = pat, header = TRUE, as.is = TRUE)
giving:
name age
1 John Smith 34
2 Smith, John 34
2) two pass Another possibility is to read it in, fix it up and then re-read it. This uses no packages and gives the same output.
L <- readLines("test.csv")
read.table(text = sub("(.*),", "\\1|", L), header = TRUE, sep = "|", as.is = TRUE)
Note: For 3 fields with the third field at the end use this in (1)
pat <- "(.*),([^,]+),([^,]+)"
The same situation use this in (2) assuming that there are non-spaces adjacent to each of the last two commas and at least one space adjacent to any commas in the text field and that fields have at least 2 characters:
text = gsub("(\\S),(\\S)", "\\1|\\2", L)
If you have some other arrangement just modify the regular expression in (1) appropriately and the sub
or gsub
in (2).
Issues importing csv data into R where the data contains additional commas
df <- read.csv("C:/dataextract.csv", skip = 1, header = FALSE)
df_cnames <- read.csv("C:/dataextract.csv", nrow = 1, header = FALSE)
df <- within(df, V2V3 <- paste(V2, V3, sep = ''))
df <- subset(df, select = (c("V1", "V2V3", "V4")))
colnames(df) <- df_cnames
It may need some modification depending on the actual source
Related Topics
Aggregating Monthly Column Values into Quarterly Values
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them
Converting Multiple Boolean Columns to Single Factor Column
R Split a Column into Multiple Column by Pattern
Converting 1M to 1000000 Elegantly
Control Padding of Grobs Added to Patchwork
How to Annotate Ggplot2 Qplot Outside of Legend and Plotarea? (Similar to Mtext())
Changing the Order of Dodged Bars in Ggplot2 Barplot
Include a Comma Separator for Data Labels
How to Display Line Numbers for Code Chunks in Rmarkdown HTML and PDF
How to Prevent Blogdown from Rerendering All Posts
How to Pass R Variable into SQLdf
Character String Is Not in a Standard Unambiguous Format
Npc Coordinates of Geom_Point in Ggplot2
Collapse/Concatenate/Aggregate Multiple Columns to a Single Comma Separated String Within Each Group
Function/Loop to Replace Na with Values in Adjacent Columns in R