Python - How to Separate Paragraphs from Text

Split text into smaller paragraphs of a minimal length without breaking the sentences given a threshold

IIUC, you want to split the text on dot, but try to keep a minimal length of the chunks to avoid having very short sentences.

What you can do is to split on the dots and join again until you reach a threshold (here 200 characters):

out = []
threshold = 200
for chunk in text.split('. '):
if out and len(chunk)+len(out[-1]) < threshold:
out[-1] += ' '+chunk+'.'
else:
out.append(chunk+'.')

output:

['Marketing products and services is a demanding and tedious task in today’s overly saturated market. Especially if you’re in a B2B lead generation business.',
'As a business owner or part of the sales team, you really need to dive deep into understanding what strategies work best and how to appeal to your customers most efficiently.',
'Lead generation is something you need to master. Understanding different types of leads will help you sell your product or services and scale your business faster.',
'That’s why we’re explaining what warm leads are and how you can easily turn them into paying customers..']

How to split a txt into custom paragraphs (and then insert them into excel columns)?

This code will split your text correctly:

with open("address","r", encoding="utf8") as file:
sections = file.read()

sections = sections.split('\n\n')
for section in sections:
print(section)

You can't split string by two newlines when you earlier split it by newline.

How to break text into paragraphs (python)

Use linebreak command \n several times as you wish and I suggest you use triple quotes as follows:

print("""
First paragraph \n\n
Second paragraph \n\n
...
Last paragraph.
""")


Related Topics



Leave a reply



Submit