How to Jump to a Particular Line in a Huge Text File

How to jump to a particular line in a huge text file?

linecache:

The linecache module allows one to get any line from a Python source file, while attempting to optimize internally, using a cache, the common case where many lines are read from a single file. This is used by the traceback module to retrieve source lines for inclusion in the formatted traceback...

PERL: Jumping to lines in a huge text file

are there any methods to jump to a particular line of the required S-value?

Yes, if the file does not change then create an index. This requires reading the file in its entirety once and noting the positions of all the S=# lines using tell. Store it in a DBM file with the key being the number and the value being the byte position in the file. Then you can use seek to jump to that point in the file and read from there.

But if you're going to do that, you're better off exporting the data into a proper database such as SQLite. Write a program to insert the data into the database and add normal SQL indexes. This will probably be simpler than writing the index. Then you can query the data efficiently using normal SQL, and make complex queries. If the file change you can either redo the export, or use the normal insert and update SQL to update the database. And it will be easy for anyone who knows SQL to work with, as opposed to a bunch of custom indexing and search code.

While reading a text file sequentially how to go back to a particular line and start again from there

You can build a list of line start positions (file offsets), and then use file.seek(line_offsets[n]) to go back to the nth line (counting from zero). After that you can read the line (and those following it sequentially) once again.

Here's example code showing building such a list incrementally:

filepath = "somefile"
line_offsets = []

with open(filepath, "r") as file:
while True:
posn = file.tell()
line = file.readline()
if not line: # end-of-file?
break
line_offsets.append(posn) # Remember where line started.
""" Process line """

Reading a specific line from a huge text file (c# 4.0)

Integers are 32-bit numbers, and so are limited to 2 billion or so.

That said, if you have to read a random line from the file, and all you know is that the file has lines, you will have to read it line by line until you reach the line you want. You can use some buffers to ease up on the I/O a little bit (they're on by default), but you won't get any better performance than that.

Unless you change the way the file is saved. If you could create an index file, containing the position of each line the main file, you can make reading a line infinitely faster.

Well, not infinitely, a but a lot faster - from O(N) to almost O(1) (almost, because seeking to a random byte in a file may not be an O(1) operation, depending on how the OS does it).

Only read specific line numbers from a large file in Python?

Here are some options:

  1. Go over the file at least once and keep track of the file offsets of the lines you are interested in. This is a good approach if you might be seeking these lines multiple times and the file wont be changed.
  2. Consider changing the data format. For example csv instead of json (see comments).
  3. If you have no other alternative, use the traditional:
def get_lines(..., linenums: list):
with open(...) as f:
for lno, ln in enumerate(f):
if lno in linenums:
yield ln

On a 4GB file this took ~6s for linenums = [n // 4, n // 2, n - 1] where n = lines_in_file.

How to get some specific lines from huge text file in unix?

To print line N, use:

sed 'Nq;d' file

To print multiple lines (assuming they are in ascending order) e.g. 994123, 1002451, 1010123:

sed '994123p;1002451p;1010123q;d' file

The q after the last line number tells sed to quit when it reaches the 1010123th line, instead of wasting time by looping over the remaining lines that we are not interested in. That is why it is efficient on large files.

How do I read a specific line on a text file?

as what Barmar said use the file.readlines(). file.readlines makes a list of lines so use an index for the line you want to read. keep in mind that the first line is 0 not 1 so to store the third line of a text document in a variable would be line = file.readlines()[2].
edit: also if what copperfield said is your situation you can do:

def read_line_from_file(file_name, line_number):
with open(file_name, 'r') as fil:
for line_no, line in enumerate(fil):
if line_no == line_number:
file.close()
return line
else:
file.close()
raise ValueError('line %s does not exist in file %s' % (line_number, file_name))

line = read_line_from_file('file.txt', 2)
print(line)
if os.path.isfile('file.txt'):
os.remove('file.txt')

it's a more readable function so you can disassemble it to your liking

Is there an efficient way how to get line N of a large file in python?

How does islice() perform on your file?

from itertools import islice

def read_n_line(filename, line_num, encoding='utf-8'):
with open(filename, encoding=encoding) as f:
return next(islice(f, line_num - 1, line_num))


Related Topics



Leave a reply



Submit