how to count the amount of sentences in a paragraph in python
If a line doesn't contain a period, split
will return a single element: the line itself:
>>> "asdasd".split('.')
['asdasd']
So you're counting the number of lines plus the number of periods. Why are you splitting the file to lines at all?
with open('words.txt', 'r') as file:
file_contents = file.read()
print('Total words: ', len(file_contents.split()))
print('total stops: ', file_contents.count('.'))
count number of sentences in paragraph
You could count the number of periods in the string, if you want to rely on those as your sentence delimiter.
line.count('.')
Or using a regular expression like you are doing for the words:
len(re.findall(r'\.', line)
determing the number of sentences, words and letters in a text file
Use num_chars += len(line.replace(' ', ''))
instead, which removes all spaces from the line.
For sentences (assuming all sentences end with a period and there's no ellipsis in the sentence), you can use the count method: num_lines += line.count(".")
So in your code it would look like:
fname = "gettysburg.txt"
num_lines = 0
num_words = 0
num_chars = 0
with open(fname, 'r') as f:
for line in f:
words = line.split()
num_lines += line.count(".")
num_words += len(words)
num_chars += len(line.replace(' ', ''))
Getting the maximum number of words in a sentence of a paragraph Python
I'm not 100% certain of what your requirements are, but if I borrow Buoy Rina's input, here's a solution using regular expressions (pattern search strings):
#!/usr/bin/env python3
import re
text = "I will go school tomorrow. I eat apples. Here is a six word sentence."
max_words = 0
sentences = re.split("[.!?]", text)
for sentence in sentences:
max_words = max( len( sentence.split() ), max_words )
print(f"max_words: {max_words}")
The re.split()
breaks the text (or paragraph) into sentences based on "some" end of sentence punctuation. There are likely conditions under which searching for period '.' won't yield a complete sentence, but we'll ignore that for simplicity.
The string function split()
then breaks up the sentence into words based on white space (the default of split()
). We then get the length of the resultant list to find the word count.
How to count the number of words in a paragraph and exclude some words (from a file)?
The first part is ok where you get the total words and print the result.
Where you fall down is here
words_par = 0
for words_par in lines:
if words_par.startswith("P1" or "P2" or "P3") & words_par.endswith("P1" or "P2" or "P3"):
words_par = line.split()
print len(words_par)
print words_par.replace('P1', '') #doesn't display it but still counts
else:
print 'No words'
The words_par is at first a string containing the line from the file. Under a condition which will never be meet, it is turned into a list with the
line.split()
expression. This, if the expression
words_par.startswith("P1" or "P2" or "P3") & words_par.endswith("P1" or "P2" or "P3")
were to ever return True, would always be splitting the last line in your file, due to the last time it was assigned to was in the first part of your program where you did a full count of the number of words in the file. That should really be
words_par.split()
Also
words_par.startswith("P1" or "P2" or "P3")
will always be
words_par.startswith("P1")
since
"P1" or "P2" or "P3"
always evaluates to the first one which is True, which is the first string in this case. Read http://docs.python.org/reference/expressions.html if you want to know more.
While we are at it, unless you are wanting to do bitwise comparisons avoid doing
something & something
instead do
something and something
The first will evaluate both expressions no matter what the result of the first, where as the second will only evaluate the second expression if the first is True. If you do this your code will operate a little more efficiently.
The
print len(words_par)
on the next line is always going to counting the number of characters in the line, since the if statement is always going to evaluate to False and the word_par never got split into a list of words.
Also the else clause on the for loop will always be executed no matter whether the sequence is empty or not. Have a look at http://docs.python.org/reference/compound_stmts.html#the-for-statement for more information.
I wrote a version of what I think you are after as a example according to what I think you want. I tried to keep it simple and avoid using things like list comprehension, since you say you are just starting to learn, so it is not optimal, but hopefully will be clear. Also note I made no comments, so feel free to hassle me to explain things for you.
words = None
with open('data.txt') as f:
words = f.read().split()
total_words = len(words)
print 'Total words:', total_words
in_para = False
para_count = 0
para_type = None
paragraph = list()
for word in words:
if ('P1' in word or
'P2' in word or
'P3' in word ):
if in_para == False:
in_para = True
para_type = word
else:
print 'Words in paragraph', para_type, ':', para_count
print ' '.join(paragraph)
para_count = 0
del paragraph[:]
para_type = word
else:
paragraph.append(word)
para_count += 1
else:
if in_para == True:
print 'Words in last paragraph', para_type, ':', para_count
print ' '.join(paragraph)
else:
print 'No words'
EDIT:
I actually just noticed some redundant code in the example. The variable para_count is not needed, since the words are being appended to the paragraph variable. So instead of
print 'Words in paragraph', para_type, ':', para_count
You could just do
print 'Words in paragraph', para_type, ':', len(paragraph)
One less variable to keep track of. Here is the corrected snippet.
in_para = False
para_type = None
paragraph = list()
for word in words:
if ('P1' in word or
'P2' in word or
'P3' in word ):
if in_para == False:
in_para = True
para_type = word
else:
print 'Words in paragraph', para_type, ':', len(paragraph)
print ' '.join(paragraph)
del paragraph[:]
para_type = word
else:
paragraph.append(word)
else:
if in_para == True:
print 'Words in last paragraph', para_type, ':', len(paragraph)
print ' '.join(paragraph)
else:
print 'No words'
Related Topics
Tkinter: How to Use Threads to Preventing Main Event Loop from "Freezing"
Python Creating Dictionary from Excel Data
How to Downgrade Python from 3.7 to 3.5 in Anaconda
Remove White Space from Entire Dataframe
Get the Row(S) Which Have the Max Value in Groups Using Groupby
How to Plot Pandas Dataframe With Date (Year/Month)
How to Get All Users in a Telegram Channel Using Telethon
How to Extract a Value (I Want an Int Not Row) from a Dataframe and Do Simple Calculations on It
Python Pandas .Isnull() Does Not Work on Nat in Object Dtype
Python: Plotting Percentage in Seaborn Bar Plot
Finding a Substring Within a String Without Using Any Built in Functions
Sub Totals and Grand Totals in Python
Sqlalchemy: How to Join Several Tables by One Query
Vary the Color of Each Bar in Bargraph Using Particular Value
Spark Data Frames - Check If Column Is of Type Integer