Split text into smaller paragraphs of a minimal length without breaking the sentences given a threshold
IIUC, you want to split the text on dot, but try to keep a minimal length of the chunks to avoid having very short sentences.
What you can do is to split on the dots and join again until you reach a threshold (here 200 characters):
out = []
threshold = 200
for chunk in text.split('. '):
if out and len(chunk)+len(out[-1]) < threshold:
out[-1] += ' '+chunk+'.'
else:
out.append(chunk+'.')
output:
['Marketing products and services is a demanding and tedious task in today’s overly saturated market. Especially if you’re in a B2B lead generation business.',
'As a business owner or part of the sales team, you really need to dive deep into understanding what strategies work best and how to appeal to your customers most efficiently.',
'Lead generation is something you need to master. Understanding different types of leads will help you sell your product or services and scale your business faster.',
'That’s why we’re explaining what warm leads are and how you can easily turn them into paying customers..']
How to split a txt into custom paragraphs (and then insert them into excel columns)?
This code will split your text correctly:
with open("address","r", encoding="utf8") as file:
sections = file.read()
sections = sections.split('\n\n')
for section in sections:
print(section)
You can't split string by two newlines when you earlier split it by newline.
How to break text into paragraphs (python)
Use linebreak command \n
several times as you wish and I suggest you use triple quotes as follows:
print("""
First paragraph \n\n
Second paragraph \n\n
...
Last paragraph.
""")
Related Topics
How to Split an Integer into an Array of Digits
How to Test Multiple Variables for Equality Against a Single Value
Jsondecodeerror: Expecting Value: Line 1 Column 1 (Char 0)
No Matching Distribution Found for Tkinter
How to Make Tkinter Frames in a Loop and Update Object Values
Clearing All Labels from a Tkinter Window
How to Make a Dataframe Show in Pycharm
Text Pre-Processing + Python + Csv:Removing Special Characters from a Column of a Csv
Pandas: Update Column Values from Another Column If Criteria
Paramiko Capturing Command Output
How to Replace Nan Values Where the Other Columns Meet a Certain Criteria
Python: Using Doctests for Classes
How to Iterate Over a Timespan After Days, Hours, Weeks and Months