Reading text file with multiple space as delimiter in R
You need to change your delimiter. " "
refers to one whitespace character. ""
refers to any length whitespace as being the delimiter
data <- read.table(file, sep = "" , header = F , nrows = 100,
na.strings ="", stringsAsFactors= F)
From the manual:
If sep = "" (the default for read.table) the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns.
Also, with a large datafile you may want to consider data.table:::fread
to quickly read data straight into a data.table. I was myself using this function this morning. It is still experimental, but I find it works very well indeed.
Reading a text file with double space delimiter in R
You can read the lines of your text file with the readLines
function. This returns a character vector where each element corresponds to a line. You can split these strings with the strsplit
function. Finally, you can combine the strings to a matrix with the rbind
function.
do.call(rbind, strsplit("filename.txt", " "))
If you need a data frame, you can convert the matrix with the function as.data.frame
.
Importing multi space delimited file
You can use tidyr::separate
to split the data into columns by three spaces.
df <- read.table(text = "Var1 Var2 var3
30000 Sedan Model 2014
30000 CHEVROLET Corvette Stingray", sep = "%", skip = 1)
tidyr::separate(df, V1, c("Var1", "Var2", "Var3"), sep = "\\s{3}", )
Var1 Var2 Var3
1 30000 Sedan Model 2014
2 30000 CHEVROLET Corvette Stingray
Reading multiple space-delimited text files from a folder in R
Try this: (you do not want to have spaces as the delimiters since there are many of them in your paragraphs):
dat <- setNames( lapply(lf, read.table, sep="|", header=FALSE), lf)
Choose a separator that you suspect will not be in the text. I'm afraid that sep=""
was a bad choice because it gets interpreted as the default for read.table which is "whitespace". The "title" of the entry for each file should be the file name.
How to read space delimited data into a data frame from your script/document file?
The "trick" is a text connection as the "file" argument to read.table:
dat <- read.table(textConnection("person1 12 15
person2 15 18
person3 20 14"), stringsAsFactors=FALSE
)
str(dat)
'data.frame': 3 obs. of 3 variables:
$ V1: chr "person1" "person2" "person3"
$ V2: int 12 15 20
$ V3: int 15 18 14
The default 'sep' argument works for whitespace separation. If you need tabs to separate then use sep="\t" (after the closing-paren from the textConnection
call).
Edit: This actually got incorporated into a subsequent revision of the underlying scan
function which was given a 'text'-argument. The code could now simply be:
dat <- read.table(text="person1 12 15
person2 15 18
person3 20 14", stringsAsFactors=FALSE
)
The readLines
function still requires the use of textConnection
to read from a 'character'-object, since it does not use scan
.
How to read a file with more than one tab as separator and where the space is part of column value
I think I found a workaround for this, 1) replacing all extra tabs with one first, 2) read the file/text. For example:
read.csv(text = gsub("[\t]+", "\t", readLines(text3), perl = TRUE), sep = "\t")
and also using a file instead:
temp <- tempfile()
writeLines(text3, temp)
read.csv(text = gsub("[\t]+", "\t", readLines(temp), perl = TRUE), sep = "\t")
The text
input argument will result:
> text
[1] "a\tb\tc" "11\t12\t1 3" "21\t22\t2 3" ""
and the result of read.csv
will be:
a b c
1 11 12 1 3
2 21 22 2 3
This is similar to @Badger suggestion, just in one step.
Related Topics
Merging Rows with the Same Id Variable
Unexpected 'Else' in "Else" Error
Inputting Na Where There Are Missing Values When Scraping with Rvest
Is There Anything Wrong with Using T & F Instead of True & False
Find Neighbouring Elements of a Matrix in R
Ggplot for Loop Outputs All the Same Graph
Ggplot2: How to Use Same Colors in Different Plots for Same Factor
Automatically Adjust Latex Table Width to Fit PDF Using Knitr and Rstudio
Replace a Value Na with the Value from Another Column in R
Data.Table Join Then Add Columns to Existing Data.Frame Without Re-Copy
Calling an R Function Using Inline and Rcpp Is Still Just as Slow as Original R Code
How to 'Print' or 'Cat' When Using Parallel
How to Install a Package from a Download Zip File
How to Delete Rows from a Data.Frame, Based on an External List, Using R
How to Change the Figure Caption Format in Bookdown