Iterating on a File Doesn't Work the Second Time

Iterating on a file doesn't work the second time

Yes, that is normal behavior. You basically read to the end of the file the first time (you can sort of picture it as reading a tape), so you can't read any more from it unless you reset it, by either using f.seek(0) to reposition to the start of the file, or to close it and then open it again which will start from the beginning of the file.

If you prefer you can use the with syntax instead which will automatically close the file for you.

e.g.,

with open('baby1990.html', 'rU') as f:
for line in f:
print line

once this block is finished executing, the file is automatically closed for you, so you could execute this block repeatedly without explicitly closing the file yourself and read the file this way over again.

Python refuses to iterate through lines in a file more than once

It's because the file = open("somefile.txt") line occurs only once, before the loop. This creates one cursor pointing to one location in the file, so when you reach the end of the first loop, the cursor is at the end of the file. Move it into the loop:

loops = 0
while loops < 5:
file = open("somefile.txt")
for line in file:
print(line)
loops = loops + 1
file.close()

How can I iterate over a file more than once?

You need to set searchfile current position to the beginning for every infile iteration. you can use seek function for this.

searchfile = open('.\sometext.txt', 'r')
infile = open('.\somefile', 'r')

for line1 in infile:
searchfile.seek(0,0)
for line2 in searchfile:
print line2

searchfile.close()
infile.close()

For loop not working twice on the same file descriptor

Because for lines in fd: will place the read pointer, file pointer or whatever it's called at the end of the file. Call fd.seek(0) in between your for loops

Iteration Over Open Text File Multiple Times in Python

so, you're looking to rewind the file to the start again:

if your file handle is called f do it like this:

   f.seek(0)

this won't work on streams, serial ports, pipes, or network sockets: only on regular files.

Why for line in file: can only use once?

Because the file is read as part of iterating over the lines. You'll need to reopen the file each time, or read the whole file into a list of lines (via file.readlines() perhaps) and iterate over that, if memory limits permit.
Any open file has a "read pointer" that tracks what's been read, which advances with each line consumed. The loops as written will each consume the whole file.

Iterate through a file multiple times

Iterating multiple times through a file is possible (you can reset the file to the start by calling thefile.seek()) but likely to be very costly.

Let's say for generality you have a function to identify the key number given a line, e.g

def getkey(line):
return line.split()[1]

in your example where the key is the second of three space-separated words in the line. Now, if the data for the second file will fit comfortably in RAM (so up to a few GB -- think how long it would take to iterate hundreds of times on that!-)...:

key2line = {}
with open(secondfile) as f:
for line in f:
key2line[getkey(line)] = line

with open(firstfile) as f:
order = [line.strip() for line in f]

with open(outputfile, 'w') as f:
for key in order:
f.write(key2line[key])

Now isn't that a pretty clear and effective approach...?

If the second file is too big by a small factor, say 10 times or so, what you can actually fit into memory, then you may still be able to solve it at the cost of lots of jumping around in the file, by using seek and tell.

The first loop would become:

key2offset = {}
with open(secondfile) as f:
offset = 0
for line in f:
new_offset = f.tell()
key2line[getkey(line)] = offset
offset = new_offset

and the last loop would become:

with open(secondfile) as f:
with open(outputfile, 'w') as f1:
for key in order:
f.seek(key2offset[key])
line = f.readline()
f1.write(line)

A bit more complex, much slower -- but still way faster than re-reading a bazillion times, over and over, a file of tens of GB!-)



Related Topics



Leave a reply



Submit