R - Reading lines from a .txt-file after a specific line
1) read.pattern read.pattern
in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines
in this self contained example can be replaced with "myfile.txt"
, say, if the data is coming from a file. Modify the pattern to suit.
Lines <- "junk
junk
##XYDATA= (X++(Y..Y))
131071 -2065
131070 -4137
131069 -6408
131068 -8043"
library(gsubfn)
DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")
giving:
> DF
V1 V2
1 131071 -2065
2 131070 -4137
3 131069 -6408
4 131068 -8043
2) read twice Another possibility using only base R is simply to read it once to determine the value of skip=
and a second time to do the actual read using that value. To read from a file myfile.txt
replace text = Lines
and textConnection(Lines)
with "myfile.txt"
.
read.table(text = Lines,
skip = grep("##XYDATA=", readLines(textConnection(Lines))))
Added Some revisions and added second approach.
How to Read Certain Lines of A Data File Into R
Check this out:
con <- file("test1.txt", "r")
lines <- c()
while(TRUE) {
line = readLines(con, 1)
if(length(line) == 0) break
else if(grepl("^\\s*F{1}", line) && grepl("(0,0)", line, fixed = TRUE)) lines <- c(lines, line)
}
lines
# [1] "F 20160602 14:25:11.321 F7982D50 GET 156.145.15.85:37525 xqixh8sl AES \"/pcgc/public/Other/exome/fastq/PCGC0077248_HS_EX__1-06808__v3_FCC49HJACXX_L7_p1of1_P1.fastq.gz\" \"\" 3322771022 (0,0) \"1499.61 seconds (17.7 megabits/sec)\""
Pass the file stream to readLines
so that it can read it line by line. Use regular expression ^\\s*F{1}
to capture line starting with letter F
with possible white spaces where ^
denote the beginning of a string. Use fixed=T
to capture the exact match of (0,0)
. If both of the checks are TRUE
, append the result to lines.
Data:
D 20160602 14:15:43.559 F7982D62 Req Agr:131 Mra:0 Exp:0 Mxr:0 Mnr:0 Mxd:0 Mnd:0 Nro:0
D 20160602 14:15:43.559 F7982D62 Set Agr:130 Mra:0 Exp:0 Mxr:0 Mnr:0 Mxd:0 Mnd:0 Nro:0 I 20160602 14:15:43.559 F7982D62 GET 156.145.15.85:36773 xqixh8sl AES "/pcgc/public/Other/exome/fastq/PCGC0065109_HS_EX__1-04692__v3_FCAD2HMUACXX_L4_p1of1_P2.fastq.gz" ""
M 20160602 14:15:43.595 DOC1: F7982D62 Request for unencrypted meta data on encrypted transaction
M 20160602 14:15:48.353 DOC1: F7982D62 Transaction has been acknowledged at 722875647
F 20160602 14:15:48.398 F7982D62 GET 156.145.15.85:36773 xqixh8sl AES "/pcgc/public/Other/exome/fastq/PCGC0065109_HS_EX__1-04692__v3_FCAD2HMUACXX_L4_p1of1_P2.fastq.gz" "" 50725464 (4,32) "Remote Application: Session Aborted: Aborted by user interrupt"
M 20160602 14:15:48.780 DOC1: F7982D63 New download request D 20160602 14:15:48.780 F7982D63 META: 134 Path: /pcgc/public/CTD/exome/fastq/PCGC0033175_HS_EX__1-00304-01__v1_FCBC0RE4ACXX_L3_p32of96_P2.fastq.gz user: xqixh8sl pack: arg: feat: cE,s
F 20160602 14:25:11.321 F7982D50 GET 156.145.15.85:37525 xqixh8sl AES "/pcgc/public/Other/exome/fastq/PCGC0077248_HS_EX__1-06808__v3_FCC49HJACXX_L7_p1of1_P1.fastq.gz" "" 3322771022 (0,0) "1499.61 seconds (17.7 megabits/sec)"
Reading a txt file line by line with skip function of every second line and the output saved as a dataframe using R
We read the data with readLines
lines <- readLines('file.txt')
Then use a recursive indexing with logical value and split it to a list
lst1 <- strsplit(gsub("\t", "", lines[c(FALSE, TRUE)]), "")
lst1
#[[1]]
# [1] "D" "M" "E" "S" "P" "V" "F" "A" "F" "P" "K" "A" "L" "D" "L" "E" "T" "H" "I" "E" "K" "L" "F" "L" "Y"
#[[2]]
# [1] "D" "D" "T" "L" "D" "D" "S" "D" "E" "D" "D" "I" "V" "V" "E" "S" "Q" "D" "P" "P" "L" "P" "S" "W" "G"
#[[3]]
# [1] "P" "R" "R" "E" "T" "E" "E" "F" "N" "D" "L" "K" "A" "L" "D" "F" "I" "L" "S" "N" "S" "L" "T" "H" "P"
#[[4]]
# [1] "E" "K" "A" "R" "M" "I" "Y" "E" "D" "D" "E" "T" "Y" "L" "S" "P" "K" "E" "V" "S" "L" "D" "S" "R" "V"
Read lines by number from a large file
The trick is to use connection AND open it before read.table
:
con<-file('filename')
open(con)
read.table(con,skip=5,nrow=1) #6-th line
read.table(con,skip=20,nrow=1) #27-th line
...
close(con)
You may also try scan
, it is faster and gives more control.
Remove certain lines (with ---- and empty lines) from txt file using readLines() or read_lines()
1) read.table If we can assume that the only occurrence of -
is where shown in the question and if ?
does not occur anywhere in the file then this will read in the data regarding every line as a single field and throwing away the header. Since -
is the comment character lines with only -
are regarded as blank and those will be thrown away. This reads the file into a one columnn data frame and the [[1]] returns that column as a character vector. If you want to keep the header omit header=TRUE.
read.table("myfile", sep = "?", comment.char = "-", header = TRUE)[[1]]
2) grep Another possibility is to read in the file and then remove lines that are empty or contain only -
characters.
grep("^-*$", readLines("myfile"), invert = TRUE, value = TRUE)
3) pipe We could process the input using a filter and then pipe that into R. On Windows grep
is found in C:\Rtools40\usr\bin
if you have Rtools40 installed but if it is not on your path either use the complete path or if you don't have it at all replace grep
with findstr
. If on UNIX/Linux the escaping may vary according to which shell you are using.
readLines(pipe('grep -v "^-*$" myfile'))
Read a text file in R line by line
Here is the solution with a for
loop. Importantly, it takes the one call to readLines
out of the for loop so that it is not improperly called again and again. Here it is:
fileName <- "up_down.txt"
conn <- file(fileName,open="r")
linn <-readLines(conn)
for (i in 1:length(linn)){
print(linn[i])
}
close(conn)
How can I ignore lines while reading a text file in R?
You can use read.table
(or other function) in combination with grep
:
read.table(text=grep("Trial", readLines(path_to_your_file), value=TRUE))
Does this solve your problem?
How to read specific lines into R?
use lapply
:
lns <- lapply(index, function(i) <your scan line>)
do.call(rbind, lns)
# or
data.table::rbindlist(lns)
Related Topics
How to Efficiently Read the First Character from Each Line of a Text File
Fast Way to Group Variables Based on Direct and Indirect Similarities in Multiple Columns
How to Access the Name of the Variable Assigned to the Result of a Function Within the Function
How to Adjust the Font Size of Tablegrob
R Dynamically Build "List" in Data.Table (Or Ddply)
R: Building a Simple Command Line Plotting Tool/Capturing Window Close Events
Replace Rbind in For-Loop with Lapply? (2Nd Circle of Hell)
Top to Bottom Alignment of Two Ggplot2 Figures
How to Create Group Indices for Nested Groups in R
Change a Column from Birth Date to Age in R
Boxplot of Table Using Ggplot2
How to Always Display 3 Decimal Places in Datatables in R Shiny