Read Lines by Number from a Large File

Only read specific line numbers from a large file in Python?

Here are some options:

  1. Go over the file at least once and keep track of the file offsets of the lines you are interested in. This is a good approach if you might be seeking these lines multiple times and the file wont be changed.
  2. Consider changing the data format. For example csv instead of json (see comments).
  3. If you have no other alternative, use the traditional:
def get_lines(..., linenums: list):
with open(...) as f:
for lno, ln in enumerate(f):
if lno in linenums:
yield ln

On a 4GB file this took ~6s for linenums = [n // 4, n // 2, n - 1] where n = lines_in_file.

Read lines by number from a large file

The trick is to use connection AND open it before read.table:

con<-file('filename')
open(con)

read.table(con,skip=5,nrow=1) #6-th line
read.table(con,skip=20,nrow=1) #27-th line
...
close(con)

You may also try scan, it is faster and gives more control.

How to read specific lines from a file (by line number)?

If the file to read is big, and you don't want to read the whole file in memory at once:

fp = open("file")
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
fp.close()

Note that i == n-1 for the nth line.


In Python 2.6 or later:

with open("file") as fp:
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break

How to read a large file - line by line?

The correct, fully Pythonic way to read a file is the following:

with open(...) as f:
for line in f:
# Do something with 'line'

The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered I/O and memory management so you don't have to worry about large files.

There should be one -- and preferably only one -- obvious way to do it.

How can I read large text files line by line, without loading them into memory?

Use a for loop on a file object to read it line-by-line. Use with open(...) to let a context manager ensure that the file is closed after reading:

with open("log.txt") as infile:
for line in infile:
print(line)

How To Read Line Numbers x Through (x+y) From A Very Large File

When you write:

lines.skip(startLine)

you create a new stream, but you don't save a reference to it, so you lose the operation.

I suspect you want something like:

return lines.skip(startLine)
.limit(100000)
.map(fileReader::populateMyModel)
.collect(toList());

Reading a particular line by line number in a very large file

pack N creates a 32-bit integer. The maximum 32-bit integer is 4GB, so using that to store indexes into a file that's 100GB in size won't work.

Some builds use 64-bit integers. On those, you could use j.

Some builds use 32-bit integers. tell returns a floating-point number on those, allowing you to index files up to 8,388,608 GB in size losslessly. On those, you should use F.

Portable code would look as follows:

use Config qw( %Config );
my $off_t = $Config{lseeksize} > $Config{ivsize} ? 'F' : 'j';

...
print $index_file pack($off_t, $offset);
...

Note: I'm assuming the index file is only used by the same Perl that built it (or at least one with with the same integer size, seek size and machine endianness). Let me know if that assumption doesn't hold for you.



Related Topics



Leave a reply



Submit