R: Why does read.table stop reading a file?
With read.table one of the default quote characters is the single quote. I'm guessing you have some unmatched single quotes in your description field and all the data between single quotes is being pooled together into one entry.
With read.delim the defualt quote character is the double quote and thus this isn't a problem.
Specify your quote character and you should be all set.
> genes<-read.table("genes.txt",sep="\t",quote="\"",na.strings="-",fill=TRUE, col.names=c("GeneSymbol","synonyms","description"))
> nrow(genes)
[1] 42476
Why is R not reading a specific text file when it can read others in the same directory?
Looking at your file it is not really CSV (comma separated) but probably TSV (tab-separated). Because of that, you should rather use read_tsv()
function.
Moreover, the file has probably BOM so the first column will get 3 extra symbols at the beginning of the name of the first column. I don't know any better with tidyverse than using rename()
:
library(tidyverse)
read_tsv('filename.csv') %>%
rename(userid.ID = colnames(.)[1])
R stops reading a table when coming across #
You can completely turn off read.table()
's interpretation of comment characters (by default set to "#"
) by setting comment.char=""
in your call to read.table()
.
Skip over all lines in a data file before and including a regular string in a loop in R
Read the input line by line using
all_content = readLines("input.txt")
>all_content
[1] "# Header information"
[2] "# Header information"
[3] "# Header information"
[4] "# Header information"
[5] "# Header information"
[6] "*END*"
[7] " 0.571 26.6331 8.2733 103.145 0.0842 -0.000049 0.000e+00"
[8] " 0.576 26.6316 8.2756 103.171 0.3601 -0.000049 0.000e+00"
[9] " 0.574 26.6322 8.2744 103.157 0.3613 -0.000046 0.000e+00"
And remove the lines till you hit *END* using grep
as follow
skip = all_content[-c(1:grep("*END*",all_content))]
Now read using the normal read.table
function as follow
input <- read.table(textConnection(skip))
> input
V1 V2 V3 V4 V5 V6 V7
1 0.571 26.6331 8.2733 103.145 0.0842 -4.9e-05 0
2 0.576 26.6316 8.2756 103.171 0.3601 -4.9e-05 0
3 0.574 26.6322 8.2744 103.157 0.3613 -4.6e-05 0
You get the desired result.
UPDATE
In your loop just use
for (x in list.files(pattern="*.cnv", recursive=TRUE)) {
all_content <- readLines(x)
skip = all_content[-c(1:grep("*END*",all_content))]
input <- read.table(textConnection(skip))
df <- rbind(df, input)
}
R - read.table imports half of the dataset - no errors nor warnings
You may have a comment character (#) in the file (try setting the option comment.char = ""
in read.table). Also, check that the quote option is set correctly.
R: Reading a delimited table when end of each row is not delimited
There was no issue with delimiting. I instead downloaded the .txt file and opened it in Microsoft Excel using '|' as the delimiter. Scrolling down to rows where there were issues, it appears that Spanish characters were causing issues.
Related Topics
Calculating Weighted Mean and Standard Deviation
Why Do Logicals (Booleans) in R Require 4 Bytes
How to Change the Position of the Table of Contents in Rmarkdown
How to Align a Group of Checkboxgroupinput in R Shiny
Ggplot2: Different Legend Symbols for Points and Lines
Minimal Example of Rpy2 Regression Using Pandas Data Frame
Dynamic Position for Ggplot2 Objects (Especially Geom_Text)
R Scoping: Disallow Global Variables in Function
Email Dataframe as Table in Email Body with Sendmailr
R Programming: How to Get Euler's Number
Checking Cran Incoming Feasibility ... Note Maintainer
How to Set Unique Row and Column Names of a Matrix When Its Dimension Is Unknown
Pass String to Facet_Grid:Ggplot2
Plotting Multiple Curves Same Graph and Same Scale
Can't Open Sockets for Parallel Cluster