Reading Text File with Multiple Space as Delimiter in R

Reading text file with multiple space as delimiter in R

You need to change your delimiter. " " refers to one whitespace character. "" refers to any length whitespace as being the delimiter

 data <- read.table(file, sep = "" , header = F , nrows = 100,
na.strings ="", stringsAsFactors= F)

From the manual:

If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.

Also, with a large datafile you may want to consider data.table:::fread to quickly read data straight into a data.table. I was myself using this function this morning. It is still experimental, but I find it works very well indeed.

Reading a text file with double space delimiter in R

You can read the lines of your text file with the readLines function. This returns a character vector where each element corresponds to a line. You can split these strings with the strsplit function. Finally, you can combine the strings to a matrix with the rbind function.

do.call(rbind, strsplit("filename.txt", "  "))

If you need a data frame, you can convert the matrix with the function as.data.frame.

Importing multi space delimited file

You can use tidyr::separate to split the data into columns by three spaces.

df <- read.table(text = "Var1    Var2    var3
30000 Sedan Model 2014
30000 CHEVROLET Corvette Stingray", sep = "%", skip = 1)

tidyr::separate(df, V1, c("Var1", "Var2", "Var3"), sep = "\\s{3}", )

Var1 Var2 Var3
1 30000 Sedan Model 2014
2 30000 CHEVROLET Corvette Stingray

Reading multiple space-delimited text files from a folder in R

Try this: (you do not want to have spaces as the delimiters since there are many of them in your paragraphs):

dat <- setNames( lapply(lf, read.table, sep="|", header=FALSE), lf)

Choose a separator that you suspect will not be in the text. I'm afraid that sep="" was a bad choice because it gets interpreted as the default for read.table which is "whitespace". The "title" of the entry for each file should be the file name.

How to read space delimited data into a data frame from your script/document file?

The "trick" is a text connection as the "file" argument to read.table:

dat <- read.table(textConnection("person1    12    15
person2 15 18
person3 20 14"), stringsAsFactors=FALSE
)
str(dat)
'data.frame': 3 obs. of 3 variables:
$ V1: chr "person1" "person2" "person3"
$ V2: int 12 15 20
$ V3: int 15 18 14

The default 'sep' argument works for whitespace separation. If you need tabs to separate then use sep="\t" (after the closing-paren from the textConnection call).

Edit: This actually got incorporated into a subsequent revision of the underlying scan function which was given a 'text'-argument. The code could now simply be:

dat <- read.table(text="person1    12    15
person2 15 18
person3 20 14", stringsAsFactors=FALSE
)

The readLines function still requires the use of textConnection to read from a 'character'-object, since it does not use scan.

How to read a file with more than one tab as separator and where the space is part of column value

I think I found a workaround for this, 1) replacing all extra tabs with one first, 2) read the file/text. For example:

read.csv(text = gsub("[\t]+", "\t", readLines(text3), perl = TRUE), sep = "\t")

and also using a file instead:

temp <- tempfile()
writeLines(text3, temp)
read.csv(text = gsub("[\t]+", "\t", readLines(temp), perl = TRUE), sep = "\t")

The text input argument will result:

> text
[1] "a\tb\tc" "11\t12\t1 3" "21\t22\t2 3" ""

and the result of read.csv will be:

   a  b   c
1 11 12 1 3
2 21 22 2 3

This is similar to @Badger suggestion, just in one step.



Related Topics



Leave a reply



Submit