R - Reading Lines from a .Txt-File After a Specific Line

R - Reading lines from a .txt-file after a specific line

1) read.pattern read.pattern in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines in this self contained example can be replaced with "myfile.txt", say, if the data is coming from a file. Modify the pattern to suit.

Lines <- "junk
junk
##XYDATA= (X++(Y..Y))
131071 -2065
131070 -4137
131069 -6408
131068 -8043"

library(gsubfn)
DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")

giving:

> DF
V1 V2
1 131071 -2065
2 131070 -4137
3 131069 -6408
4 131068 -8043

2) read twice Another possibility using only base R is simply to read it once to determine the value of skip= and a second time to do the actual read using that value. To read from a file myfile.txt replace text = Lines and textConnection(Lines) with "myfile.txt" .

read.table(text = Lines, 
skip = grep("##XYDATA=", readLines(textConnection(Lines))))

Added Some revisions and added second approach.

How to Read Certain Lines of A Data File Into R

Check this out:

con <- file("test1.txt", "r")
lines <- c()
while(TRUE) {
line = readLines(con, 1)
if(length(line) == 0) break
else if(grepl("^\\s*F{1}", line) && grepl("(0,0)", line, fixed = TRUE)) lines <- c(lines, line)
}

lines
# [1] "F 20160602 14:25:11.321 F7982D50 GET 156.145.15.85:37525 xqixh8sl AES \"/pcgc/public/Other/exome/fastq/PCGC0077248_HS_EX__1-06808__v3_FCC49HJACXX_L7_p1of1_P1.fastq.gz\" \"\" 3322771022 (0,0) \"1499.61 seconds (17.7 megabits/sec)\""

Pass the file stream to readLines so that it can read it line by line. Use regular expression ^\\s*F{1} to capture line starting with letter F with possible white spaces where ^ denote the beginning of a string. Use fixed=T to capture the exact match of (0,0). If both of the checks are TRUE, append the result to lines.

Data:

D 20160602 14:15:43.559 F7982D62 Req Agr:131 Mra:0 Exp:0 Mxr:0 Mnr:0 Mxd:0 Mnd:0 Nro:0      
D 20160602 14:15:43.559 F7982D62 Set Agr:130 Mra:0 Exp:0 Mxr:0 Mnr:0 Mxd:0 Mnd:0 Nro:0 I 20160602 14:15:43.559 F7982D62 GET 156.145.15.85:36773 xqixh8sl AES "/pcgc/public/Other/exome/fastq/PCGC0065109_HS_EX__1-04692__v3_FCAD2HMUACXX_L4_p1of1_P2.fastq.gz" ""
M 20160602 14:15:43.595 DOC1: F7982D62 Request for unencrypted meta data on encrypted transaction
M 20160602 14:15:48.353 DOC1: F7982D62 Transaction has been acknowledged at 722875647
F 20160602 14:15:48.398 F7982D62 GET 156.145.15.85:36773 xqixh8sl AES "/pcgc/public/Other/exome/fastq/PCGC0065109_HS_EX__1-04692__v3_FCAD2HMUACXX_L4_p1of1_P2.fastq.gz" "" 50725464 (4,32) "Remote Application: Session Aborted: Aborted by user interrupt"
M 20160602 14:15:48.780 DOC1: F7982D63 New download request D 20160602 14:15:48.780 F7982D63 META: 134 Path: /pcgc/public/CTD/exome/fastq/PCGC0033175_HS_EX__1-00304-01__v1_FCBC0RE4ACXX_L3_p32of96_P2.fastq.gz user: xqixh8sl pack: arg: feat: cE,s
F 20160602 14:25:11.321 F7982D50 GET 156.145.15.85:37525 xqixh8sl AES "/pcgc/public/Other/exome/fastq/PCGC0077248_HS_EX__1-06808__v3_FCC49HJACXX_L7_p1of1_P1.fastq.gz" "" 3322771022 (0,0) "1499.61 seconds (17.7 megabits/sec)"

Reading a txt file line by line with skip function of every second line and the output saved as a dataframe using R

We read the data with readLines

lines <- readLines('file.txt')

Then use a recursive indexing with logical value and split it to a list

lst1 <- strsplit(gsub("\t", "", lines[c(FALSE, TRUE)]), "")
lst1
#[[1]]
# [1] "D" "M" "E" "S" "P" "V" "F" "A" "F" "P" "K" "A" "L" "D" "L" "E" "T" "H" "I" "E" "K" "L" "F" "L" "Y"

#[[2]]
# [1] "D" "D" "T" "L" "D" "D" "S" "D" "E" "D" "D" "I" "V" "V" "E" "S" "Q" "D" "P" "P" "L" "P" "S" "W" "G"

#[[3]]
# [1] "P" "R" "R" "E" "T" "E" "E" "F" "N" "D" "L" "K" "A" "L" "D" "F" "I" "L" "S" "N" "S" "L" "T" "H" "P"

#[[4]]
# [1] "E" "K" "A" "R" "M" "I" "Y" "E" "D" "D" "E" "T" "Y" "L" "S" "P" "K" "E" "V" "S" "L" "D" "S" "R" "V"

Read lines by number from a large file

The trick is to use connection AND open it before read.table:

con<-file('filename')
open(con)

read.table(con,skip=5,nrow=1) #6-th line
read.table(con,skip=20,nrow=1) #27-th line
...
close(con)

You may also try scan, it is faster and gives more control.

Remove certain lines (with ---- and empty lines) from txt file using readLines() or read_lines()

1) read.table If we can assume that the only occurrence of - is where shown in the question and if ? does not occur anywhere in the file then this will read in the data regarding every line as a single field and throwing away the header. Since - is the comment character lines with only - are regarded as blank and those will be thrown away. This reads the file into a one columnn data frame and the [[1]] returns that column as a character vector. If you want to keep the header omit header=TRUE.

read.table("myfile", sep = "?", comment.char = "-", header = TRUE)[[1]]

2) grep Another possibility is to read in the file and then remove lines that are empty or contain only - characters.

grep("^-*$", readLines("myfile"), invert = TRUE, value = TRUE)

3) pipe We could process the input using a filter and then pipe that into R. On Windows grep is found in C:\Rtools40\usr\bin if you have Rtools40 installed but if it is not on your path either use the complete path or if you don't have it at all replace grep with findstr. If on UNIX/Linux the escaping may vary according to which shell you are using.

readLines(pipe('grep -v "^-*$" myfile'))

Read a text file in R line by line

Here is the solution with a for loop. Importantly, it takes the one call to readLines out of the for loop so that it is not improperly called again and again. Here it is:

fileName <- "up_down.txt"
conn <- file(fileName,open="r")
linn <-readLines(conn)
for (i in 1:length(linn)){
print(linn[i])
}
close(conn)

How can I ignore lines while reading a text file in R?

You can use read.table (or other function) in combination with grep:

read.table(text=grep("Trial", readLines(path_to_your_file), value=TRUE))

Does this solve your problem?

How to read specific lines into R?

use lapply:

lns <- lapply(index, function(i) <your scan line>) 

do.call(rbind, lns)

# or
data.table::rbindlist(lns)


Related Topics



Leave a reply



Submit