Read a Text File in R Line by Line

Read a text file in R line by line

Here is the solution with a for loop. Importantly, it takes the one call to readLines out of the for loop so that it is not improperly called again and again. Here it is:

fileName <- "up_down.txt"
conn <- file(fileName,open="r")
linn <-readLines(conn)
for (i in 1:length(linn)){
print(linn[i])
}
close(conn)

How to read a txt file line by line in R/Rstudio?

You can use readLines function.

What is a good way to read line-by-line in R?

The example Josh linked to is one that I use all the time.

inputFile <- "/home/jal/myFile.txt"
con <- file(inputFile, open = "r")

dataList <- list()
ecdfList <- list()

while (length(oneLine <- readLines(con, n = 1, warn = FALSE)) > 0) {
myVector <- (strsplit(oneLine, " "))
myVector <- list(as.numeric(myVector[[1]]))
dataList <- c(dataList,myVector)

myEcdf <- ecdf(myVector[[1]])
ecdfList <- c(ecdfList,myEcdf)

}

close(con)

I edited the example to create two lists from your example data. dataList is a list where each item in the list is a vector of numeric values from each line in your text file. ecdfList is a list where each element is an ecdf for each line in your text file.

You should probably add some try() or trycatch() logic in there to properly handle situations where the ecdf can't be created because of nulls or some such. But the above example should get you pretty close. Good luck!

R - Reading lines from a .txt-file after a specific line

1) read.pattern read.pattern in gsubfn can be used to read only lines matching a specific pattern. In this example we match beginning of line, optional space(s), 1 or more digits, 1 or more spaces, an optional minus followed by 1 or more digits, optional space(s), end of line. The portions matching the parenthesized portions of the regexp are returned as columns in a data.frame. text = Lines in this self contained example can be replaced with "myfile.txt", say, if the data is coming from a file. Modify the pattern to suit.

Lines <- "junk
junk
##XYDATA= (X++(Y..Y))
131071 -2065
131070 -4137
131069 -6408
131068 -8043"

library(gsubfn)
DF <- read.pattern(text = Lines, pattern = "^ *(\\d+) +(-?\\d+) *$")

giving:

> DF
V1 V2
1 131071 -2065
2 131070 -4137
3 131069 -6408
4 131068 -8043

2) read twice Another possibility using only base R is simply to read it once to determine the value of skip= and a second time to do the actual read using that value. To read from a file myfile.txt replace text = Lines and textConnection(Lines) with "myfile.txt" .

read.table(text = Lines, 
skip = grep("##XYDATA=", readLines(textConnection(Lines))))

Added Some revisions and added second approach.

reading text file in r and store what is read conditioned on the next line

This will be somewhat problematic because the format is so irregular from item to item. Heres a run at the first item codebook text:

txt <- "m5a2                                                     A2. Confirm how much time child lives with respondent
--------------------------------------------------------------------------------------------------------------

type: numeric (byte)
label: BM_101F

range: [-9,7] units: 1
unique values: 8 missing .: 0/4898

tabulation: Freq. Numeric Label
1383 -9 -9 Not in wave
4 -2 -2 Don't know
2 -1 -1 Refuse
3272 1 1 all or most of the time
29 2 2 about half of the time
76 3 3 some of the time
80 4 4 none of the time
52 7 7 only on weekends
"
Lines <- readLines( textConnection(txt))
# isolate lines with letter in first column
Lines[grep("^[a-zA-Z]", Lines)]
# Now replace long runs of spaces with commas and scan:

scan(text=sub("[ ]{10,100}", ",", Lines[grep("^[a-zA-Z]", Lines)] ),
sep=",", what="")
#----
Read 2 items
[1] "m5a2"
[2] "A2. Confirm how much time child lives with respondent"

The "tabulation" line can be used to create column labels.

colnames <- scan(text=sub(".*tabulation[:]", "",
Lines[grep("tabulation[:]", Lines)] ), sep="", what="")
#Read 3 items

The substitution-with-commas strategy need to be a bit more involved for the lines afterward. First isolate the rows where a numeric digit is the first non-space character:

dataRows <- Lines[grep("^[ ]*\\d", Lines)]

Then substitute commas for the pattern digit-2+spaces and read with read.csv:

 myDat <- read.csv(text=  
gsub("(\\d)[ ]{2,}", "\\1,", dataRows ),
header=FALSE ,col.names=colnames)

#------------
myDat
V1 V2 V3
1 1383 -9 -9 Not in wave
2 4 -2 -2 Don't know
3 2 -1 -1 Refuse
4 3272 1 1 all or most of the time
5 29 2 2 about half of the time
6 76 3 3 some of the time
7 80 4 4 none of the time
8 52 7 7 only on weekends

Looping over multiple items might be possible with a counter generated from cumsum( grepl("^-------", Lines) if the Lines-object were the entire file such as the one at:

 Lines <- readLines("http://fragilefamilies.princeton.edu/sites/fragilefamilies/files/ff_mom_cb9.txt")
sum( grepl("^-------", Lines) )
#----------------------
[1] 1966
Warning messages:
1: In grepl("^-------", Lines) :
input string 6995 is invalid in this locale
2: In grepl("^-------", Lines) :
input string 7349 is invalid in this locale
3: In grepl("^-------", Lines) :
input string 7350 is invalid in this locale
4: In grepl("^-------", Lines) :
input string 7352 is invalid in this locale
5: In grepl("^-------", Lines) :
input string 7353 is invalid in this locale

My "hand-held scan()-er" suggested to me that there were only two types of codebook record: "tabulations" (presumably items with fewer than 10 or so intances) and "examples"(ones with more). They had different structures (as can be seen above in your codebook fragment) so maybe only two types of parsing logic would be needed to be built and deployed. So I think the tools to do that are described above.

The warnings all relate to the character "\x92" being used as an apostrophe. Regex and R share an escape-character "\", so you need to escape the escapes. They could be corrected with:

Lines <- gsub("\\\x92", "'", Lines )

Reading a txt file line by line with skip function of every second line and the output saved as a dataframe using R

We read the data with readLines

lines <- readLines('file.txt')

Then use a recursive indexing with logical value and split it to a list

lst1 <- strsplit(gsub("\t", "", lines[c(FALSE, TRUE)]), "")
lst1
#[[1]]
# [1] "D" "M" "E" "S" "P" "V" "F" "A" "F" "P" "K" "A" "L" "D" "L" "E" "T" "H" "I" "E" "K" "L" "F" "L" "Y"

#[[2]]
# [1] "D" "D" "T" "L" "D" "D" "S" "D" "E" "D" "D" "I" "V" "V" "E" "S" "Q" "D" "P" "P" "L" "P" "S" "W" "G"

#[[3]]
# [1] "P" "R" "R" "E" "T" "E" "E" "F" "N" "D" "L" "K" "A" "L" "D" "F" "I" "L" "S" "N" "S" "L" "T" "H" "P"

#[[4]]
# [1] "E" "K" "A" "R" "M" "I" "Y" "E" "D" "D" "E" "T" "Y" "L" "S" "P" "K" "E" "V" "S" "L" "D" "S" "R" "V"


Related Topics



Leave a reply



Submit