Extract Values Between Two Strings in a Text File Using Python

Extract Values between two strings in a text file using python

Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.

with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Start":
            copy = True
            continue
        elif line.strip() == "End":
            copy = False
            continue
        elif copy:
            outfile.write(line)

Extract Values between two strings in a text file

Great problem! This is a bucket problem where each start needs an end.

The reason why you got the result is because there are two consecutive 'Start'.

It's best to store the information somewhere until 'End' is triggered.

infile = open('scores.txt','r')
outfile= open('testt.txt','w')
copy = False
for line in infile:

    if line.strip() == "Start":
        bucket = []
        copy = True

    elif line.strip() == "End":
        for strings in bucket:
            outfile.write( strings + '\n')
        copy = False

    elif copy:
        bucket.append(line.strip())

Reading lines between two strings in text file using python

Just rearrange your if statements. Think about the order in which they flow and when if flag is being evaluated. Also, you can use elif so only one of the three conditions would execute, but make sure the elif flag line is the last condition.

With the way you have your example setup, it will check to see if the line starts with START, and then set the flag. Immediately after that happens, you are checking if the flag was set, so it will print out START. Additionally it will print every line, then check after you've printed the line to see if it should have printed END or not.

With rearranging the order, you will see that if the line starts with START, then there's no command below that will print the line. Similarly, it checks to see if it should stop before printing the END line.

with open('/tmp/test.txt','r') as f:
    for line in f:
        if line.strip().endswith('END'):
            flag=False
        if flag:
            data.append(line)
        if line.startswith('START'):
            flag=True

The elif version is probably the better way to go since it will save you a few checks of if statements, but only one outcome can be executed per iteration. So if a flag is changed, then it will never print out the line.

with open('/tmp/test.txt','r') as f:
    for line in f:
        if line.startswith('START'):
            flag=True
        elif line.strip().endswith('END'):
            flag=False
        elif flag:
            data.append(line)

Python: extract values between two strings in text file

You can use:

def sort(path):
    with open(path) as f,\
            open('mom.txt', 'w') as mom,\
            open('dad.txt', 'w') as dad:
        curr = None # keep tracks of current speaker
        for line in f:
            if 'Mom:' in line:
                curr = 'Mom' # set the current speaker to Mom
            elif 'Dad:' in line:
                curr = 'Dad' # set the current speaker to Dad
            else:
                if curr == 'Mom':
                    mom.write(line)
                elif curr == 'Dad':
                    dad.write(line)

The resulting mom.txt and dad.txt file should look like:

# mom.txt
Hi
Bye

# dad.txt
Hi
Bye
:)

How to extract text between two substrings from a Python file

You are reading the file line by line, but your matches span across lines. You need to read the file in and process it with a regex that can match any chars across lines:

import re
start = '#*'
end = '#@'
rx = r'{}.*?{}'.format(re.escape(start), re.escape(end)) # Escape special chars, build pattern dynamically
with open('lorem.txt') as myfile:
    contents = myfile.read()                     # Read file into a variable
    for match in re.findall(rx, contents, re.S): # Note re.S will make . match line breaks, too
        # Process each match individually

See the regex demo.

Python read specific lines of text between two strings

One slight modification which looks like it should cover your problem:

flist = open("filename.txt").readlines()

parsing = False
for line in flist:
    if line.startswith("\t**** Report 1"):
        parsing = True
    elif line.startswith("\t**** Report 2"):
        parsing = False
    if parsing:
        #Do stuff with data

If you want to avoid parsing the line "* Report 1"... itself, simply put the start condition after the if parsing, i.e.

flist = open("filename.txt").readlines()

parsing = False
for line in flist:

    if line.startswith("\t**** Report 2"):
        parsing = False
    if parsing:
        #Do stuff with data 
    if line.startswith("\t**** Report 1"):
        parsing = True

Extract text between two strings if a substring exists between the two strings using Regex in Python

You can fix the code using

pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)

The pattern (see demo) is

StartString\s*((?:(?!StartString).)*?substring 1.*?)\s*EndString

Details

StartString - left-hand delimiter
\s* - 0+ whitespaces
((?:(?!StartString).)*?substring 1.*?) - Group 1:
- (?:(?!StartString).)*? - any char, 0 or more but as few as possible, that does not start with the left-hand delimiter
- substring 1 - third string
- .*? - any 0+ chars, as few as possible
\s*EndString - 0+ whitespaces and the right-hand delimiter.

See the Python demo:

import re
text_data='ghsauaigyssts twh\n\nghguy  hja  StartString I want this text (1) if substring 1 lies in between the two strings EndString bhghk [jhbn] xxzh StartString I want this text (2) as a different variable if substring 2 lies in between the two strings EndString ghjyjgu'
target1 = 'StartString'
target2 = 'substring 1'
target3 = 'EndString'
pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)
pattern = re.compile(pat1, flags=re.DOTALL)
print(pattern.findall(text_data))
# => ['I want this text (1) if substring 1 lies in between the two strings']

How can I repeatedly parse text in a text file between two strings?

Here's how I would do:

from pprint import pprint

file_contents = """\
---
Title of my file
Subtitle of my file
---

+------+-------------------+------+
|  a   |        aa         | aaa  |
|  b   |        bb         | bbb  |
|  c   |        cc         | ccc  |
|  d   |        dd         | ddd  |      # Section 1
|  e   |        ee         | eee  |
|  f   |        ff         | fff  |
+======+===================+======+
|  g   |        gg         | ggg  |
|  h   |        hh         | hhh  |
|  i   |        ii         | iii  |      # Section 2
|  j   |        jj         | jjj  |
|  k   |        kk         | kkk  |
|  l   |        ll         | lll  |
+------+-------------------+------+\
"""
lines = file_contents.split('\n')

# TODO update as needed
start_end_line_prefixes = ('+---', '+===')

sections = []
curr_section = None

for line in lines:
    if any(line.startswith(prefix) for prefix in start_end_line_prefixes):
        curr_section = []
        sections.append(curr_section)
    elif curr_section is not None:
        curr_section.append(line)

# Remove empty list in last index (if needed)
if not sections[-1]:
    sections.pop()

pprint(sections)

Output:

[['|  a   |        aa         | aaa  |',
  '|  b   |        bb         | bbb  |',
  '|  c   |        cc         | ccc  |',
  '|  d   |        dd         | ddd  |      # Section 1',
  '|  e   |        ee         | eee  |',
  '|  f   |        ff         | fff  |'],
 ['|  g   |        gg         | ggg  |',
  '|  h   |        hh         | hhh  |',
  '|  i   |        ii         | iii  |      # Section 2',
  '|  j   |        jj         | jjj  |',
  '|  k   |        kk         | kkk  |',
  '|  l   |        ll         | lll  |']]