Extract Values Between Two Strings in a Text File Using Python

Extract Values between two strings in a text file using python

Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)

Extract Values between two strings in a text file

Great problem! This is a bucket problem where each start needs an end.

The reason why you got the result is because there are two consecutive 'Start'.

It's best to store the information somewhere until 'End' is triggered.

infile = open('scores.txt','r')
outfile= open('testt.txt','w')
copy = False
for line in infile:

if line.strip() == "Start":
bucket = []
copy = True

elif line.strip() == "End":
for strings in bucket:
outfile.write( strings + '\n')
copy = False

elif copy:
bucket.append(line.strip())

Reading lines between two strings in text file using python

Just rearrange your if statements. Think about the order in which they flow and when if flag is being evaluated. Also, you can use elif so only one of the three conditions would execute, but make sure the elif flag line is the last condition.

With the way you have your example setup, it will check to see if the line starts with START, and then set the flag. Immediately after that happens, you are checking if the flag was set, so it will print out START. Additionally it will print every line, then check after you've printed the line to see if it should have printed END or not.

With rearranging the order, you will see that if the line starts with START, then there's no command below that will print the line. Similarly, it checks to see if it should stop before printing the END line.

with open('/tmp/test.txt','r') as f:
for line in f:
if line.strip().endswith('END'):
flag=False
if flag:
data.append(line)
if line.startswith('START'):
flag=True

The elif version is probably the better way to go since it will save you a few checks of if statements, but only one outcome can be executed per iteration. So if a flag is changed, then it will never print out the line.

with open('/tmp/test.txt','r') as f:
for line in f:
if line.startswith('START'):
flag=True
elif line.strip().endswith('END'):
flag=False
elif flag:
data.append(line)

Python: extract values between two strings in text file

You can use:

def sort(path):
with open(path) as f,\
open('mom.txt', 'w') as mom,\
open('dad.txt', 'w') as dad:
curr = None # keep tracks of current speaker
for line in f:
if 'Mom:' in line:
curr = 'Mom' # set the current speaker to Mom
elif 'Dad:' in line:
curr = 'Dad' # set the current speaker to Dad
else:
if curr == 'Mom':
mom.write(line)
elif curr == 'Dad':
dad.write(line)

The resulting mom.txt and dad.txt file should look like:

# mom.txt
Hi
Bye

# dad.txt
Hi
Bye
:)

How to extract text between two substrings from a Python file

You are reading the file line by line, but your matches span across lines. You need to read the file in and process it with a regex that can match any chars across lines:

import re
start = '#*'
end = '#@'
rx = r'{}.*?{}'.format(re.escape(start), re.escape(end)) # Escape special chars, build pattern dynamically
with open('lorem.txt') as myfile:
contents = myfile.read() # Read file into a variable
for match in re.findall(rx, contents, re.S): # Note re.S will make . match line breaks, too
# Process each match individually

See the regex demo.

Python read specific lines of text between two strings

One slight modification which looks like it should cover your problem:

flist = open("filename.txt").readlines()

parsing = False
for line in flist:
if line.startswith("\t**** Report 1"):
parsing = True
elif line.startswith("\t**** Report 2"):
parsing = False
if parsing:
#Do stuff with data

If you want to avoid parsing the line "* Report 1"... itself, simply put the start condition after the if parsing, i.e.

flist = open("filename.txt").readlines()

parsing = False
for line in flist:

if line.startswith("\t**** Report 2"):
parsing = False
if parsing:
#Do stuff with data
if line.startswith("\t**** Report 1"):
parsing = True

Extract text between two strings if a substring exists between the two strings using Regex in Python

You can fix the code using

pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)

The pattern (see demo) is

StartString\s*((?:(?!StartString).)*?substring 1.*?)\s*EndString

Details

  • StartString - left-hand delimiter
  • \s* - 0+ whitespaces
  • ((?:(?!StartString).)*?substring 1.*?) - Group 1:
    • (?:(?!StartString).)*? - any char, 0 or more but as few as possible, that does not start with the left-hand delimiter
    • substring 1 - third string
    • .*? - any 0+ chars, as few as possible
  • \s*EndString - 0+ whitespaces and the right-hand delimiter.

See the Python demo:

import re
text_data='ghsauaigyssts twh\n\nghguy hja StartString I want this text (1) if substring 1 lies in between the two strings EndString bhghk [jhbn] xxzh StartString I want this text (2) as a different variable if substring 2 lies in between the two strings EndString ghjyjgu'
target1 = 'StartString'
target2 = 'substring 1'
target3 = 'EndString'
pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)
pattern = re.compile(pat1, flags=re.DOTALL)
print(pattern.findall(text_data))
# => ['I want this text (1) if substring 1 lies in between the two strings']

How can I repeatedly parse text in a text file between two strings?

Here's how I would do:

from pprint import pprint

file_contents = """\
---
Title of my file
Subtitle of my file
---

+------+-------------------+------+
| a | aa | aaa |
| b | bb | bbb |
| c | cc | ccc |
| d | dd | ddd | # Section 1
| e | ee | eee |
| f | ff | fff |
+======+===================+======+
| g | gg | ggg |
| h | hh | hhh |
| i | ii | iii | # Section 2
| j | jj | jjj |
| k | kk | kkk |
| l | ll | lll |
+------+-------------------+------+\
"""
lines = file_contents.split('\n')

# TODO update as needed
start_end_line_prefixes = ('+---', '+===')

sections = []
curr_section = None

for line in lines:
if any(line.startswith(prefix) for prefix in start_end_line_prefixes):
curr_section = []
sections.append(curr_section)
elif curr_section is not None:
curr_section.append(line)

# Remove empty list in last index (if needed)
if not sections[-1]:
sections.pop()

pprint(sections)

Output:

[['|  a   |        aa         | aaa  |',
'| b | bb | bbb |',
'| c | cc | ccc |',
'| d | dd | ddd | # Section 1',
'| e | ee | eee |',
'| f | ff | fff |'],
['| g | gg | ggg |',
'| h | hh | hhh |',
'| i | ii | iii | # Section 2',
'| j | jj | jjj |',
'| k | kk | kkk |',
'| l | ll | lll |']]


Related Topics



Leave a reply



Submit