Python - Read File from and to Specific Lines of Text

How to read specific lines from a file (by line number)?

If the file to read is big, and you don't want to read the whole file in memory at once:

fp = open("file")
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break
fp.close()

Note that i == n-1 for the nth line.


In Python 2.6 or later:

with open("file") as fp:
for i, line in enumerate(fp):
if i == 25:
# 26th line
elif i == 29:
# 30th line
elif i > 29:
break

python - Read file from and to specific lines of text

If you simply want the block of text between Start and End, you can do something simple like:

with open('test.txt') as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Start': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'End':
break
print line # Line is extracted (or block_of_lines.append(line), etc.)

In fact, you do not need to manipulate line numbers in order to read the data between the Start and End markers.

The logic ("read until…") is repeated in both blocks, but it is quite clear and efficient (other methods typically involve checking some state [before block/within block/end of block reached], which incurs a time penalty).

How do I read a specific line on a text file?

as what Barmar said use the file.readlines(). file.readlines makes a list of lines so use an index for the line you want to read. keep in mind that the first line is 0 not 1 so to store the third line of a text document in a variable would be line = file.readlines()[2].
edit: also if what copperfield said is your situation you can do:

def read_line_from_file(file_name, line_number):
with open(file_name, 'r') as fil:
for line_no, line in enumerate(fil):
if line_no == line_number:
file.close()
return line
else:
file.close()
raise ValueError('line %s does not exist in file %s' % (line_number, file_name))

line = read_line_from_file('file.txt', 2)
print(line)
if os.path.isfile('file.txt'):
os.remove('file.txt')

it's a more readable function so you can disassemble it to your liking

How to extract specific lines from a text file and then from these extracted line, extract the values between parantheses and put them in another file

I suggest you use Regular Expressions library (re) which gives you all you need to extract the data from text files. I ran a simple code to solve your current problem:

import re
# Customize path as the file's address on your system
text_file = open('path/sample.txt','r')
# Read the file line by line using .readlines(), so that each line will be a continuous long string in the "file_lines" list
file_lines = text_file.readlines()

Depending on how your target is located in each line, detailed process from here on could be a little different but the overall approach is the same in every scenario.
I have assumed your only condition is that the line starts with "Id of the track" and we are looking to extract all the values between parentheses all in one place.

# A list to append extracted data
list_extracted_data = []
for line in list_lines:
# Flag is True if the line starts (special character for start: \A) with 'Id of the track'
flag = re.search('\AId of the track',line)
if flag:
searched_phrase = re.search(r'\B\(.*',line)
start_index, end_index = searched_phrase.start(), searched_phrase.end()
# Select the indices from each line as it contains our extracted data
list_extracted_data.append(line[start_index:end_index])

print(list_extracted_data)

['(0.8835006455995176, -0.07697617837544447)', '(0.8835006455995176, -0.07697617837544447)', '(0.8835006455995176, -0.07697617837544447)', '(0.8835006455995176, -0.07697617837544447)', '(0.8755597308669424, -0.23473345870373538)', '(0.8835006455995176, -0.07697617837544447)', '(0.8755597308669424, -0.23473345870373538)', '(6.4057079727806485, -0.6819141582566414)', '(1.1815888836384334,
-0.35535274681454954)']

you can do all sorts of things after you've selected the data from each line, including convert it to numerical type or separating the two numbers inside the parentheses.
I assume your intention was to add each of the numbers inside into a different column in a dataFrame:

final_df = pd.DataFrame(columns=['id','X','Y'])
for K, pair in enumerate(list_extracted_data):
# split by comma, select the left part, exclude the '(' at the start
this_X = float(pair.split(',')[0][1:])
# split by comma, select the right part, exclude the ')' at the end
this_Y = float(pair.split(',')[1][:-1])
final_df = final_df.append({'id':K,'X':this_X,'Y':this_Y},ignore_index=True)

Sample Image

Only read specific line numbers from a large file in Python?

Here are some options:

  1. Go over the file at least once and keep track of the file offsets of the lines you are interested in. This is a good approach if you might be seeking these lines multiple times and the file wont be changed.
  2. Consider changing the data format. For example csv instead of json (see comments).
  3. If you have no other alternative, use the traditional:
def get_lines(..., linenums: list):
with open(...) as f:
for lno, ln in enumerate(f):
if lno in linenums:
yield ln

On a 4GB file this took ~6s for linenums = [n // 4, n // 2, n - 1] where n = lines_in_file.

How to read a specific line from a text file in python

You're reading your line with the newline character at the end. Your line1 variable probably contains string '1\n'.

Try calling .strip() immediatelly after readline:

line1 = file.readline().strip()
line2 = file.readline().strip()
line3 = file.readline().strip()


Related Topics



Leave a reply



Submit