How to Extract Email Headers Extending on Multiple Lines from File

PHP Regex to get multiline value out of Email Header

Your second regex is close. This modified version should do the trick:

/Subject: (.*)(\n\s+(.*))*/i

By switch the * in the middle to a +, there must be a space at the start of the line to grab it. The * at the end allows the regex to match any number of lines as long as all but the first start with a space.

how to extract multiple lines between header and footer in bash

Using bash parameter expansions

signature=${string#*$'\n'-----BEGIN PGP SIGNATURE-----$'\n'}
signature=${signature#*$'\n\n'}
signature=${signature%%$'\n'-----END PGP SIGNATURE-----*}

The first assignment removes the part from the beginning of the string to the line consisting of -----BEGIN PGP SIGNATURE-----. The second one removes the part to the first blank line. The third removes the part from the -----END PGP SIGNATURE----- to the end of the string. The remaining string is the base64 signature.

Explanation of parameter expansion forms used in the answer:

${var#pattern} is replaced by the content of the variable var with the shortest matching pattern deleted (from the beginning), if the pattern matches the leading portion of the content of the variable var.

${var%%pattern} is replaced by the content of the variable var with the longest matching pattern deleted (from the end), if the pattern matches the trailing portion of the content of the variable var.

For detailed information on all forms of bash parameter expansion, read the shell parameter expansion.

Extract lines from multiple text filenames and then pull those lines from a textfile - Linux

Optimized and fast find solution:

find . -type f -name "*.sh.e[0-9]*" -size +0c -exec sh -c 'fn=$1; n=${fn##*.}; \
sed -n "$n p" ../temp/files.2017-09-26.txt' _ {} \;

  • fn=$1 - fn variable is assigned with filename returned by find command

  • n=${fn##*.} - extracting the needed numeric suffix from the filename (i.e. 92, 102 etc)

Extract lines between headings repeated through file

If you just want to extract the lines between *Nsets the following approach should work:

In [5]: with open("master.txt") as f:
...: data = []
...: gather = False
...: for line in f:
...: line = line.strip()
...: if line.startswith("*Nset"):
...: gather = True
...: elif line.startswith("*"):
...: gather = False
...: elif line and gather:
...: data.append(line)
...:

In [6]: data
Out[6]:
['1, 2, 3, 4, 5, 6, 7,',
'12, 13, 14, 15, 16,',
'17, 52, 75, 86, 92,',
'90, 91, 92 93, 94, 95....',
'numbers',
'numbers']

And, if you want the additional information, it is simple enough to extend the above:

In [7]: with open("master.txt") as f:
...: nset_lines = []
...: nset_count = 0
...: data = []
...: gather = False
...: for i, line in enumerate(f):
...: line = line.strip()
...: if line.startswith("*Nset"):
...: gather = True
...: nset_lines.append(i)
...: nset_count += 1
...: elif line.startswith("*"):
...: gather = False
...: elif line and gather:
...: data.append(line)
...:

In [8]: nset_lines
Out[8]: [0, 14, 18]

In [9]: nset_count
Out[9]: 3

In [10]: data
Out[10]:
['1, 2, 3, 4, 5, 6, 7,',
'12, 13, 14, 15, 16,',
'17, 52, 75, 86, 92,',
'90, 91, 92 93, 94, 95....',
'numbers',
'numbers']

How do you extract multiple email addresses from an RFC 2822 mail header in python?

Pass all of the To lines through email.utils.getaddresses():

msg="""To: user1@company1.com, John Doe <user2@example.com>, "Public, John Q." <user3@example.com>
From: anotheruser@user.com
Subject: This is a subject

This is the message.
"""

import email

msg822 = email.message_from_string(msg)
for to in email.utils.getaddresses(msg822.get_all("To", [])):
print("To:",to)

Note that I rewrote your To line. I believe your example wasn't a valid format.

Reference: https://docs.python.org/3/library/email.utils.html#email.utils.getaddresses

Python Extracting strings from multiple lines of strings

Minor changes in your code:

def extractData():
filename = ("data.txt")
infile = open(filename,'r')

for x in infile.readlines():
x = x.strip()
if x.startswith(">"):
header = x
else:
sequence = x
if header.startswith(">b22"):
print(header, sequence)
header = ''

infile.close()

extractData()

Btw, you can use debugger to identify what is wrong with the flow of program. If you are new to Python then I would recommend using Eclipse with Pydev plugin for interactive debugging. Link for Tutorial on Pydev in Eclipse

Having said that, issue appears because if header.startswith(">b22") is being evaluated for each line parsed from file. When you move it inside else block it will only be evaluated after you are done parsing sequence (and it does not evaluate for header lines, obviously).



Related Topics



Leave a reply



Submit