Python: How to Calculate the Average Word Length in a Sentence Using the .Split Command

Python: How can I calculate the average word length in a sentence using the .split command?

In Python 3 (which you appear to be using):

>>> sentence = "Hi my name is Bob"
>>> words = sentence.split()
>>> average = sum(len(word) for word in words) / len(words)
>>> average
2.6

Python - Return Average of Word Length In Sentences

sentences = words.split('.')
sentences = [sentence.split() for sentence in sentences if len(sentence)]
averages = [sum(len(word) for word in sentence)/len(sentence) for sentence in sentences]

Issue with program that counts the average word length

The main issue:

The following statement returns words as an integer. Therefore you cannot iterate.

words = len(sentence.split())

Given that you want to iterate over your list of words, try this instead:

words = sentence.split()
n_words = len(words)

In more detail:

Here is an updated and working version of your code, using the example above:

sentence = input("Give your sentence: ")
# Updated here -->
words = sentence.split()
n_words = len(words)
# <--
print(words)
characters = 0
for word in words:
characters += len(word)
average_word_length = characters/n_words # <-- and here.

If you'd like to take this a step further using a syntax called list comprehension (which is very useful!), here is another example:

words = input("Give your sentence: ").split()
avg_len = sum([len(w) for w in words])/len(words)

print('Words:', words)
print('Average length:', avg_len)

Finding average word length in a string

Try this:

import re

def avrg_count(x):
total_chars = len(re.sub(r'[^a-zA-Z0-9]', '', x))
num_words = len(re.sub(r'[^a-zA-Z0-9 ]', '', x).split())
print "Characters:{0}\nWords:{1}\nAverage word length: {2}".format(total_chars, num_words, total_chars/float(num_words))


phrase = '***The ?! quick brown cat: leaps over the sad boy.'

avrg_count(phrase)

Output:

Characters:34
Words:9
Average word length: 3.77777777778

Computes the average word length for all the words in a file?

wi = length of word i

wavg = ∑ wi / N

with open(input('Enter file name: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
w_avg = sum(w)/len(w)

Python program for word count, average word length, word frequency and frequency of words starting with letters of the alphabet

There are many ways to achieve this, a more advanced approach would involve an initial simple gathering of the text and its words, then working on the data with ML/DS tools, with which you could extrapolate more statistics (Things like "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" etc.)

If you just need very basic statistics you can gather them while iterating over each word and do the calculations at the end of it, like:

stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}

with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]

# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)

# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1

# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1

# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']

Online Demo here

Have number of words per sentence, how to get mean?

If you only want letters, this should work:

def sentence_num_and_mean(text):
# Replace ! and ? with .
for ch in ['!', '?']:
if ch in text:
text = text.replace(ch, '.')

output = []
sentences = text.split(".")
for sentence in sentences:
words = [x for x in sentence.split(" ") if x]
word_count = len(words)
word_length = sum(map(len, words))
word_mean = word_length / word_count
output.append((word_count, word_mean))

return output


split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)

Output:

First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 4.666666666666667), (7, 4.0), (2, 6.5), (2, 7.5), (3, 4.0)]


Related Topics



Leave a reply



Submit