How to Count the Amount of Sentences in a Paragraph in Python

how to count the amount of sentences in a paragraph in python

If a line doesn't contain a period, split will return a single element: the line itself:

>>> "asdasd".split('.')
['asdasd']

So you're counting the number of lines plus the number of periods. Why are you splitting the file to lines at all?

with open('words.txt', 'r') as file:
file_contents = file.read()

print('Total words: ', len(file_contents.split()))
print('total stops: ', file_contents.count('.'))

count number of sentences in paragraph

You could count the number of periods in the string, if you want to rely on those as your sentence delimiter.

line.count('.')

Or using a regular expression like you are doing for the words:

len(re.findall(r'\.', line)

determing the number of sentences, words and letters in a text file

Use num_chars += len(line.replace(' ', '')) instead, which removes all spaces from the line.

For sentences (assuming all sentences end with a period and there's no ellipsis in the sentence), you can use the count method: num_lines += line.count(".")

So in your code it would look like:

fname = "gettysburg.txt"

num_lines = 0
num_words = 0
num_chars = 0

with open(fname, 'r') as f:
for line in f:
words = line.split()

num_lines += line.count(".")
num_words += len(words)
num_chars += len(line.replace(' ', ''))

Getting the maximum number of words in a sentence of a paragraph Python

I'm not 100% certain of what your requirements are, but if I borrow Buoy Rina's input, here's a solution using regular expressions (pattern search strings):

#!/usr/bin/env python3
import re
text = "I will go school tomorrow. I eat apples. Here is a six word sentence."

max_words = 0
sentences = re.split("[.!?]", text)
for sentence in sentences:
max_words = max( len( sentence.split() ), max_words )


print(f"max_words: {max_words}")

The re.split() breaks the text (or paragraph) into sentences based on "some" end of sentence punctuation. There are likely conditions under which searching for period '.' won't yield a complete sentence, but we'll ignore that for simplicity.

The string function split() then breaks up the sentence into words based on white space (the default of split()). We then get the length of the resultant list to find the word count.

How to count the number of words in a paragraph and exclude some words (from a file)?

The first part is ok where you get the total words and print the result.

Where you fall down is here

words_par = 0
for words_par in lines:
if words_par.startswith("P1" or "P2" or "P3") & words_par.endswith("P1" or "P2" or "P3"):
words_par = line.split()
print len(words_par)
print words_par.replace('P1', '') #doesn't display it but still counts
else:
print 'No words'

The words_par is at first a string containing the line from the file. Under a condition which will never be meet, it is turned into a list with the

line.split()

expression. This, if the expression

words_par.startswith("P1" or "P2" or "P3") & words_par.endswith("P1" or "P2" or "P3")

were to ever return True, would always be splitting the last line in your file, due to the last time it was assigned to was in the first part of your program where you did a full count of the number of words in the file. That should really be

words_par.split()

Also

words_par.startswith("P1" or "P2" or "P3")

will always be

words_par.startswith("P1")

since

"P1" or "P2" or "P3"

always evaluates to the first one which is True, which is the first string in this case. Read http://docs.python.org/reference/expressions.html if you want to know more.

While we are at it, unless you are wanting to do bitwise comparisons avoid doing

something & something

instead do

something and something

The first will evaluate both expressions no matter what the result of the first, where as the second will only evaluate the second expression if the first is True. If you do this your code will operate a little more efficiently.

The

print len(words_par)

on the next line is always going to counting the number of characters in the line, since the if statement is always going to evaluate to False and the word_par never got split into a list of words.

Also the else clause on the for loop will always be executed no matter whether the sequence is empty or not. Have a look at http://docs.python.org/reference/compound_stmts.html#the-for-statement for more information.

I wrote a version of what I think you are after as a example according to what I think you want. I tried to keep it simple and avoid using things like list comprehension, since you say you are just starting to learn, so it is not optimal, but hopefully will be clear. Also note I made no comments, so feel free to hassle me to explain things for you.

words = None
with open('data.txt') as f:
words = f.read().split()
total_words = len(words)
print 'Total words:', total_words

in_para = False
para_count = 0
para_type = None
paragraph = list()
for word in words:
if ('P1' in word or
'P2' in word or
'P3' in word ):
if in_para == False:
in_para = True
para_type = word
else:
print 'Words in paragraph', para_type, ':', para_count
print ' '.join(paragraph)
para_count = 0
del paragraph[:]
para_type = word
else:
paragraph.append(word)
para_count += 1
else:
if in_para == True:
print 'Words in last paragraph', para_type, ':', para_count
print ' '.join(paragraph)
else:
print 'No words'

EDIT:

I actually just noticed some redundant code in the example. The variable para_count is not needed, since the words are being appended to the paragraph variable. So instead of

print 'Words in paragraph', para_type, ':', para_count

You could just do

print 'Words in paragraph', para_type, ':', len(paragraph)

One less variable to keep track of. Here is the corrected snippet.

in_para = False
para_type = None
paragraph = list()
for word in words:
if ('P1' in word or
'P2' in word or
'P3' in word ):
if in_para == False:
in_para = True
para_type = word
else:
print 'Words in paragraph', para_type, ':', len(paragraph)
print ' '.join(paragraph)
del paragraph[:]
para_type = word
else:
paragraph.append(word)
else:
if in_para == True:
print 'Words in last paragraph', para_type, ':', len(paragraph)
print ' '.join(paragraph)
else:
print 'No words'


Related Topics



Leave a reply



Submit