How to Load Comma Separated Data into R

How to load comma separated data into R?

If you look at the help on read.table you'll discover some extra functions that are essentially read.table with different defaults. If you tend to read in lots of files that would be best read in using those defaults then use them instead of read.table for conciseness.

This code will read in your file

x <- read.table("C:\\flatFile.txt", header=TRUE, sep = ',')

or this code

x <- read.csv("C:\\flatFile.txt")

Note that, while you can set any of the features of these read.table based commands just like read.table, it is rather pointless to use them and reiterate the default settings. For example, don't bother with read.csv if you're then going to also be setting header = TRUE, and/or, sep = ',' all of the time as well. You might as well just use read.table in that case.

Read files that are pipe AND comma delimited: |column1|,|column2|

read.table("./temp.csv", sep=",", quote = "|") will do the trick...

reading comma-separated strings with read.csv()

1) read.pattern read.pattern (in gsubfn package) can read such files:

library(gsubfn)

pat <- "(.*),(.*)"
read.pattern("test.csv", pattern = pat, header = TRUE, as.is = TRUE)

giving:

         name age
1 John Smith 34
2 Smith, John 34

2) two pass Another possibility is to read it in, fix it up and then re-read it. This uses no packages and gives the same output.

L <- readLines("test.csv")
read.table(text = sub("(.*),", "\\1|", L), header = TRUE, sep = "|", as.is = TRUE)

Note: For 3 fields with the third field at the end use this in (1)

pat <- "(.*),([^,]+),([^,]+)"

The same situation use this in (2) assuming that there are non-spaces adjacent to each of the last two commas and at least one space adjacent to any commas in the text field and that fields have at least 2 characters:

text = gsub("(\\S),(\\S)", "\\1|\\2", L)

If you have some other arrangement just modify the regular expression in (1) appropriately and the sub or gsub in (2).

Issues importing csv data into R where the data contains additional commas

df <- read.csv("C:/dataextract.csv", skip = 1, header = FALSE)
df_cnames <- read.csv("C:/dataextract.csv", nrow = 1, header = FALSE)

df <- within(df, V2V3 <- paste(V2, V3, sep = ''))
df <- subset(df, select = (c("V1", "V2V3", "V4")))
colnames(df) <- df_cnames

It may need some modification depending on the actual source



Related Topics



Leave a reply



Submit