Iterating on a file doesn't work the second time
Yes, that is normal behavior. You basically read to the end of the file the first time (you can sort of picture it as reading a tape), so you can't read any more from it unless you reset it, by either using f.seek(0)
to reposition to the start of the file, or to close it and then open it again which will start from the beginning of the file.
If you prefer you can use the with
syntax instead which will automatically close the file for you.
e.g.,
with open('baby1990.html', 'rU') as f:
for line in f:
print line
once this block is finished executing, the file is automatically closed for you, so you could execute this block repeatedly without explicitly closing the file yourself and read the file this way over again.
Python refuses to iterate through lines in a file more than once
It's because the file = open("somefile.txt")
line occurs only once, before the loop. This creates one cursor pointing to one location in the file, so when you reach the end of the first loop, the cursor is at the end of the file. Move it into the loop:
loops = 0
while loops < 5:
file = open("somefile.txt")
for line in file:
print(line)
loops = loops + 1
file.close()
How can I iterate over a file more than once?
You need to set searchfile
current position to the beginning for every infile
iteration. you can use seek
function for this.
searchfile = open('.\sometext.txt', 'r')
infile = open('.\somefile', 'r')
for line1 in infile:
searchfile.seek(0,0)
for line2 in searchfile:
print line2
searchfile.close()
infile.close()
For loop not working twice on the same file descriptor
Because for lines in fd:
will place the read pointer, file pointer or whatever it's called at the end of the file. Call fd.seek(0)
in between your for
loops
Iteration Over Open Text File Multiple Times in Python
so, you're looking to rewind the file to the start again:
if your file handle is called f do it like this:
f.seek(0)
this won't work on streams, serial ports, pipes, or network sockets: only on regular files.
Why for line in file: can only use once?
Because the file is read as part of iterating over the lines. You'll need to reopen the file each time, or read the whole file into a list of lines (via file.readlines() perhaps) and iterate over that, if memory limits permit.
Any open file has a "read pointer" that tracks what's been read, which advances with each line consumed. The loops as written will each consume the whole file.
Iterate through a file multiple times
Iterating multiple times through a file is possible (you can reset the file to the start by calling thefile.seek()
) but likely to be very costly.
Let's say for generality you have a function to identify the key number given a line, e.g
def getkey(line):
return line.split()[1]
in your example where the key is the second of three space-separated words in the line. Now, if the data for the second file will fit comfortably in RAM (so up to a few GB -- think how long it would take to iterate hundreds of times on that!-)...:
key2line = {}
with open(secondfile) as f:
for line in f:
key2line[getkey(line)] = line
with open(firstfile) as f:
order = [line.strip() for line in f]
with open(outputfile, 'w') as f:
for key in order:
f.write(key2line[key])
Now isn't that a pretty clear and effective approach...?
If the second file is too big by a small factor, say 10 times or so, what you can actually fit into memory, then you may still be able to solve it at the cost of lots of jumping around in the file, by using seek and tell.
The first loop would become:
key2offset = {}
with open(secondfile) as f:
offset = 0
for line in f:
new_offset = f.tell()
key2line[getkey(line)] = offset
offset = new_offset
and the last loop would become:
with open(secondfile) as f:
with open(outputfile, 'w') as f1:
for key in order:
f.seek(key2offset[key])
line = f.readline()
f1.write(line)
A bit more complex, much slower -- but still way faster than re-reading a bazillion times, over and over, a file of tens of GB!-)
Related Topics
How to Run Celery Workers by Superuser
Fastest Way to Download 3 Million Objects from a S3 Bucket
Convert Binary to Ascii and Vice Versa
How to Urlencode a Querystring in Python
How to Use Filter, Map, and Reduce in Python 3
Find the Column Name Which Has the Maximum Value for Each Row
Typeerror: 'Int' Object Is Not Callable
Using an Numpy Array as Indices of the 2Nd Dim of Another Array
How to Rotate an Image Around an Off Center Pivot in Pygame
No Such File or Directory "Limits.H" When Installing Pillow on Alpine Linux
Pandas Get Topmost N Records Within Each Group
Using Pandas to Pd.Read_Excel() for Multiple Worksheets of the Same Workbook
Removing Elements That Have Consecutive Duplicates
Moving Average or Running Mean