Extract Values between two strings in a text file using python
Just in case you have multiple "Start"s and "End"s in your text file, this will import all the data together, excluding all the "Start"s and "End"s.
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
copy = False
for line in infile:
if line.strip() == "Start":
copy = True
continue
elif line.strip() == "End":
copy = False
continue
elif copy:
outfile.write(line)
Extract Values between two strings in a text file
Great problem! This is a bucket problem where each start needs an end.
The reason why you got the result is because there are two consecutive 'Start'.
It's best to store the information somewhere until 'End' is triggered.
infile = open('scores.txt','r')
outfile= open('testt.txt','w')
copy = False
for line in infile:
if line.strip() == "Start":
bucket = []
copy = True
elif line.strip() == "End":
for strings in bucket:
outfile.write( strings + '\n')
copy = False
elif copy:
bucket.append(line.strip())
Reading lines between two strings in text file using python
Just rearrange your if statements. Think about the order in which they flow and when if flag
is being evaluated. Also, you can use elif
so only one of the three conditions would execute, but make sure the elif flag
line is the last condition.
With the way you have your example setup, it will check to see if the line starts with START
, and then set the flag. Immediately after that happens, you are checking if the flag was set, so it will print out START
. Additionally it will print every line, then check after you've printed the line to see if it should have printed END
or not.
With rearranging the order, you will see that if the line starts with START
, then there's no command below that will print the line. Similarly, it checks to see if it should stop before printing the END
line.
with open('/tmp/test.txt','r') as f:
for line in f:
if line.strip().endswith('END'):
flag=False
if flag:
data.append(line)
if line.startswith('START'):
flag=True
The elif
version is probably the better way to go since it will save you a few checks of if statements, but only one outcome can be executed per iteration. So if a flag is changed, then it will never print out the line.
with open('/tmp/test.txt','r') as f:
for line in f:
if line.startswith('START'):
flag=True
elif line.strip().endswith('END'):
flag=False
elif flag:
data.append(line)
Python: extract values between two strings in text file
You can use:
def sort(path):
with open(path) as f,\
open('mom.txt', 'w') as mom,\
open('dad.txt', 'w') as dad:
curr = None # keep tracks of current speaker
for line in f:
if 'Mom:' in line:
curr = 'Mom' # set the current speaker to Mom
elif 'Dad:' in line:
curr = 'Dad' # set the current speaker to Dad
else:
if curr == 'Mom':
mom.write(line)
elif curr == 'Dad':
dad.write(line)
The resulting mom.txt
and dad.txt
file should look like:
# mom.txt
Hi
Bye
# dad.txt
Hi
Bye
:)
How to extract text between two substrings from a Python file
You are reading the file line by line, but your matches span across lines. You need to read the file in and process it with a regex that can match any chars across lines:
import re
start = '#*'
end = '#@'
rx = r'{}.*?{}'.format(re.escape(start), re.escape(end)) # Escape special chars, build pattern dynamically
with open('lorem.txt') as myfile:
contents = myfile.read() # Read file into a variable
for match in re.findall(rx, contents, re.S): # Note re.S will make . match line breaks, too
# Process each match individually
See the regex demo.
Python read specific lines of text between two strings
One slight modification which looks like it should cover your problem:
flist = open("filename.txt").readlines()
parsing = False
for line in flist:
if line.startswith("\t**** Report 1"):
parsing = True
elif line.startswith("\t**** Report 2"):
parsing = False
if parsing:
#Do stuff with data
If you want to avoid parsing the line "* Report 1"... itself, simply put the start condition after the if parsing
, i.e.
flist = open("filename.txt").readlines()
parsing = False
for line in flist:
if line.startswith("\t**** Report 2"):
parsing = False
if parsing:
#Do stuff with data
if line.startswith("\t**** Report 1"):
parsing = True
Extract text between two strings if a substring exists between the two strings using Regex in Python
You can fix the code using
pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)
The pattern (see demo) is
StartString\s*((?:(?!StartString).)*?substring 1.*?)\s*EndString
Details
StartString
- left-hand delimiter\s*
- 0+ whitespaces((?:(?!StartString).)*?substring 1.*?)
- Group 1:(?:(?!StartString).)*?
- any char, 0 or more but as few as possible, that does not start with the left-hand delimitersubstring 1
- third string.*?
- any 0+ chars, as few as possible
\s*EndString
- 0+ whitespaces and the right-hand delimiter.
See the Python demo:
import re
text_data='ghsauaigyssts twh\n\nghguy hja StartString I want this text (1) if substring 1 lies in between the two strings EndString bhghk [jhbn] xxzh StartString I want this text (2) as a different variable if substring 2 lies in between the two strings EndString ghjyjgu'
target1 = 'StartString'
target2 = 'substring 1'
target3 = 'EndString'
pat1 = '{0}\s*((?:(?!{0}).)*?{1}.*?)\s*{2}'.format(target1,target2,target3)
pattern = re.compile(pat1, flags=re.DOTALL)
print(pattern.findall(text_data))
# => ['I want this text (1) if substring 1 lies in between the two strings']
How can I repeatedly parse text in a text file between two strings?
Here's how I would do:
from pprint import pprint
file_contents = """\
---
Title of my file
Subtitle of my file
---
+------+-------------------+------+
| a | aa | aaa |
| b | bb | bbb |
| c | cc | ccc |
| d | dd | ddd | # Section 1
| e | ee | eee |
| f | ff | fff |
+======+===================+======+
| g | gg | ggg |
| h | hh | hhh |
| i | ii | iii | # Section 2
| j | jj | jjj |
| k | kk | kkk |
| l | ll | lll |
+------+-------------------+------+\
"""
lines = file_contents.split('\n')
# TODO update as needed
start_end_line_prefixes = ('+---', '+===')
sections = []
curr_section = None
for line in lines:
if any(line.startswith(prefix) for prefix in start_end_line_prefixes):
curr_section = []
sections.append(curr_section)
elif curr_section is not None:
curr_section.append(line)
# Remove empty list in last index (if needed)
if not sections[-1]:
sections.pop()
pprint(sections)
Output:
[['| a | aa | aaa |',
'| b | bb | bbb |',
'| c | cc | ccc |',
'| d | dd | ddd | # Section 1',
'| e | ee | eee |',
'| f | ff | fff |'],
['| g | gg | ggg |',
'| h | hh | hhh |',
'| i | ii | iii | # Section 2',
'| j | jj | jjj |',
'| k | kk | kkk |',
'| l | ll | lll |']]
Related Topics
How to Iterate Through a Matrix Column in Python
How to Use Authenticated Proxy in Selenium Chromedriver
Reading Particular Cell Value from Excelsheet in Python
How to Convert Np.Int64 into Python Int64 for Pandasseries
Incorrect Column Alignment When Printing Table in Python Using Tab Characters
Implement K-Fold Cross Validation in Mlpclassification Python
Split List into Two Parts Based on Some Delimiter in Each List Element in Python
Comparing Two Xml Files in Python
How to Deal With Certificates Using Selenium
Find Row Where Values for Column Is Maximal in a Pandas Dataframe
Pandas: Difference Between Pivot and Pivot_Table. Why Is Only Pivot_Table Working
Python/Pandas: Convert Month Int to Month Name
Python Does Not Match Format '%Y-%M-%Dt%H:%M:%S%Z.%F'
How to Compile Multiple Python Files into Single .Exe File Using Pyinstaller