Python: How can I calculate the average word length in a sentence using the .split command?
In Python 3 (which you appear to be using):
>>> sentence = "Hi my name is Bob"
>>> words = sentence.split()
>>> average = sum(len(word) for word in words) / len(words)
>>> average
2.6
Python - Return Average of Word Length In Sentences
sentences = words.split('.')
sentences = [sentence.split() for sentence in sentences if len(sentence)]
averages = [sum(len(word) for word in sentence)/len(sentence) for sentence in sentences]
Issue with program that counts the average word length
The main issue:
The following statement returns words
as an integer. Therefore you cannot iterate.
words = len(sentence.split())
Given that you want to iterate over your list of words, try this instead:
words = sentence.split()
n_words = len(words)
In more detail:
Here is an updated and working version of your code, using the example above:
sentence = input("Give your sentence: ")
# Updated here -->
words = sentence.split()
n_words = len(words)
# <--
print(words)
characters = 0
for word in words:
characters += len(word)
average_word_length = characters/n_words # <-- and here.
If you'd like to take this a step further using a syntax called list comprehension (which is very useful!), here is another example:
words = input("Give your sentence: ").split()
avg_len = sum([len(w) for w in words])/len(words)
print('Words:', words)
print('Average length:', avg_len)
Finding average word length in a string
Try this:
import re
def avrg_count(x):
total_chars = len(re.sub(r'[^a-zA-Z0-9]', '', x))
num_words = len(re.sub(r'[^a-zA-Z0-9 ]', '', x).split())
print "Characters:{0}\nWords:{1}\nAverage word length: {2}".format(total_chars, num_words, total_chars/float(num_words))
phrase = '***The ?! quick brown cat: leaps over the sad boy.'
avrg_count(phrase)
Output:
Characters:34
Words:9
Average word length: 3.77777777778
Computes the average word length for all the words in a file?
wi = length of word i
wavg = ∑ wi / N
with open(input('Enter file name: '),'r') as f:
w = [len(word) for line in f for word in line.rstrip().split(" ")]
w_avg = sum(w)/len(w)
Python program for word count, average word length, word frequency and frequency of words starting with letters of the alphabet
There are many ways to achieve this, a more advanced approach would involve an initial simple gathering of the text and its words, then working on the data with ML/DS tools, with which you could extrapolate more statistics (Things like "a new paragraph starts mostly with X words" / "X words are mostly preceeded/succeeded by Y words" etc.)
If you just need very basic statistics you can gather them while iterating over each word and do the calculations at the end of it, like:
stats = {
'amount': 0,
'length': 0,
'word_count': {},
'initial_count': {}
}
with open('lorem.txt', 'r') as f:
for line in f:
line = line.strip()
if not line:
continue
for word in line.split():
word = word.lower()
initial = word[0]
# Add word and length count
stats['amount'] += 1
stats['length'] += len(word)
# Add initial count
if not initial in stats['initial_count']:
stats['initial_count'][initial] = 0
stats['initial_count'][initial] += 1
# Add word count
if not word in stats['word_count']:
stats['word_count'][word] = 0
stats['word_count'][word] += 1
# Calculate average word length
stats['average_length'] = stats['length'] / stats['amount']
Online Demo here
Have number of words per sentence, how to get mean?
If you only want letters, this should work:
def sentence_num_and_mean(text):
# Replace ! and ? with .
for ch in ['!', '?']:
if ch in text:
text = text.replace(ch, '.')
output = []
sentences = text.split(".")
for sentence in sentences:
words = [x for x in sentence.split(" ") if x]
word_count = len(words)
word_length = sum(map(len, words))
word_mean = word_length / word_count
output.append((word_count, word_mean))
return output
split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)
Output:
First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 4.666666666666667), (7, 4.0), (2, 6.5), (2, 7.5), (3, 4.0)]
Related Topics
How to Sort a List of Lists by a Specific Index of the Inner List
Cast String to Float Is Not Supported in Linear Model
Python Check If Website Exists
How to Wait Until I Receive Data Using a Python Socket
How to Set Proxy Authentication (User & Password) Using Python + Selenium
Stuck With Loops in Python - Only Returning First Value
Shifting the Elements of an Array in Python
Print the Lines of a Log File Which Starts With Date Format "Yyyy-Mm-Dd" in Python
Count Unique Words in a Text File (Python)
Hiding Axis Text in Matplotlib Plots
Python 2D List Performance, Without Numpy
Get Current Url from Browser Using Python
How to Downgrade Tensorflow, Multiple Versions Possible
How to Close an Internet Tab With Cmd/Python
How to Use and Print the Pandas Dataframe Name