R: Why Does Read.Table Stop Reading a File

R: Why does read.table stop reading a file?

With read.table one of the default quote characters is the single quote. I'm guessing you have some unmatched single quotes in your description field and all the data between single quotes is being pooled together into one entry.

With read.delim the defualt quote character is the double quote and thus this isn't a problem.

Specify your quote character and you should be all set.

> genes<-read.table("genes.txt",sep="\t",quote="\"",na.strings="-",fill=TRUE, col.names=c("GeneSymbol","synonyms","description"))
> nrow(genes)
[1] 42476

Why is R not reading a specific text file when it can read others in the same directory?

Looking at your file it is not really CSV (comma separated) but probably TSV (tab-separated). Because of that, you should rather use read_tsv() function.

Moreover, the file has probably BOM so the first column will get 3 extra symbols at the beginning of the name of the first column. I don't know any better with tidyverse than using rename():

library(tidyverse)

read_tsv('filename.csv') %>%
rename(userid.ID = colnames(.)[1])

R stops reading a table when coming across #

You can completely turn off read.table()'s interpretation of comment characters (by default set to "#") by setting comment.char="" in your call to read.table().

Skip over all lines in a data file before and including a regular string in a loop in R

Read the input line by line using

all_content = readLines("input.txt")
>all_content
[1] "# Header information"
[2] "# Header information"
[3] "# Header information"
[4] "# Header information"
[5] "# Header information"
[6] "*END*"
[7] " 0.571 26.6331 8.2733 103.145 0.0842 -0.000049 0.000e+00"
[8] " 0.576 26.6316 8.2756 103.171 0.3601 -0.000049 0.000e+00"
[9] " 0.574 26.6322 8.2744 103.157 0.3613 -0.000046 0.000e+00"

And remove the lines till you hit *END* using grep as follow

skip = all_content[-c(1:grep("*END*",all_content))]

Now read using the normal read.table function as follow

input <- read.table(textConnection(skip))
> input
V1 V2 V3 V4 V5 V6 V7
1 0.571 26.6331 8.2733 103.145 0.0842 -4.9e-05 0
2 0.576 26.6316 8.2756 103.171 0.3601 -4.9e-05 0
3 0.574 26.6322 8.2744 103.157 0.3613 -4.6e-05 0

You get the desired result.

UPDATE

In your loop just use

for (x in list.files(pattern="*.cnv", recursive=TRUE)) {
all_content <- readLines(x)
skip = all_content[-c(1:grep("*END*",all_content))]
input <- read.table(textConnection(skip))
df <- rbind(df, input)
}

R - read.table imports half of the dataset - no errors nor warnings

You may have a comment character (#) in the file (try setting the option comment.char = "" in read.table). Also, check that the quote option is set correctly.

R: Reading a delimited table when end of each row is not delimited

There was no issue with delimiting. I instead downloaded the .txt file and opened it in Microsoft Excel using '|' as the delimiter. Scrolling down to rows where there were issues, it appears that Spanish characters were causing issues.



Related Topics



Leave a reply



Submit