Count Frequency of Words in a List and Sort by Frequency

Finding frequency of words in Python and frequencies in a descending order

I agree with @lolu that you should use a dictionary but if you still want to use a list, here is a solution:

import re


def frequency_check(lines):
print("Frequency of words in file")
words = re.findall(r"\w+", lines)
unique_words = set(words)
item_list = []

for item in unique_words:
item_count = words.count(item)
item_list.append((item, item_count))

item_list.sort(key=lambda t: (t[1], t[0]), reverse=True)
for item, item_count in item_list:
print("{} : {} times".format(item, item_count))


with open("original-3.txt", 'r') as file1:
lines = file1.read().lower()
frequency_check(lines)

And a much better implementation using collections.Counter:

import re
from collections import Counter


def frequency_check(lines):
print("Frequency of words in file")
words = re.findall(r"\w+", lines)
word_counts = Counter(words)
for item, item_count in word_counts.most_common():
print("{} : {} times".format(item, item_count))


with open("original-3.txt", 'r') as file1:
lines = file1.read().lower()
frequency_check(lines)

Counting Word Frequency in a list of lists

If you want a one liner:

from collections import Counter
counts = Counter(x for sublist in list1 for x in sublist)

or a multi-liner without any imports:

counts = {}
for sublist in list1:
for x in sublist:
if x in counts:
counts[x] += 1
else:
counts[x] = 0

though I recommend using Counter for readability.

Count frequency of words inside a list in a dictionary

You can use a list-comprehension along with collections.Counter which does exactly what you want with the nested list. -

from collections import Counter

[{'name':i.get('name'),
'keywords':[dict(Counter([j for j in i.get('keywords')
if j in common_keywords]))]} for i in people]
[{'name': 'Bob', 'keywords': [{'dog': 2}]},
{'name': 'Kate', 'keywords': [{'cat': 1}]},
{'name': 'Sasha', 'keywords': [{'person': 1, 'cat': 1}]}]


  1. First, with the list comprehension you want to reconstruct the original list of dicts with keys separately defined along with i.get('key'). This will let to work with the nested list value for keywords.
  2. Iterate over the list and filter only the ones in common_keywords
  3. Pass this list into collections.Counter to get your dict
  4. Return it as a list with a single dict inside as you expect it to be

Sorted Word frequency count using python

You can use the same dictionary:

>>> d = { "foo": 4, "bar": 2, "quux": 3 }
>>> sorted(d.items(), key=lambda item: item[1])

The second line prints:

[('bar', 2), ('quux', 3), ('foo', 4)]

If you only want a sorted word list, do:

>>> [pair[0] for pair in sorted(d.items(), key=lambda item: item[1])]

That line prints:

['bar', 'quux', 'foo']

Sorting a list based on frequency of words

First, want to confirm:
Can you get all the whole words before sorting? Or these words come continuously in a stream?

(1)For the former case, you can use a Set to store the words, then put them into a PriorityQueue. If you implement the comparator function, the queue will sort the words automatically. I create a new class Pair to store the text and frequency, see the code:

import java.util.Queue;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.HashSet;
import java.util.Comparator;

public class PriorityQueueTest {

public static class Pair {
private String text;
private int frequency;

@Override
public int hashCode() {
return text.hashCode();
}

@Override
public String toString() {
return text + ":" + frequency;
}

public Pair(String text, int frequency) {
super();
this.text = text;
this.frequency = frequency;
}

public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
public int getFrequency() {
return frequency;
}
public void setFrequency(int frequency) {
this.frequency = frequency;
}
}

public static Comparator<Pair> idComparator = new Comparator<Pair>(){
@Override
public int compare(Pair o1, Pair o2) {
if(o1.getFrequency() > o2.getFrequency()) {
return -1;
}
else if(o1.getFrequency() < o2.getFrequency()){
return 1;
}
else {
return 0;
}
}
};

public static void main(String[] args) {
Set<Pair> data = new HashSet<Pair>();
data.add(new Pair("haha", 3));
data.add(new Pair("Hehe", 5));
data.add(new Pair("Test", 10));

Queue<Pair> queue = new PriorityQueue(16, idComparator);

for(Pair pair : data) {
queue.add(pair);
}

// Test the order
Pair temp = null;
while((temp = queue.poll()) != null) {
System.out.println(temp);
}

}

}

(2)For the other case(the words come continuously), you may use a TreeMap to keep the order.
See ref: http://www.java-samples.com/showtutorial.php?tutorialid=370

How to count the frequency of the elements in an unordered list?

Note: You should sort the list before using groupby.

You can use groupby from itertools package if the list is an ordered list.

a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
from itertools import groupby
[len(list(group)) for key, group in groupby(a)]

Output:

[4, 4, 2, 1, 2]

update: Note that sorting takes O(n log(n)) time.

How to get frequency of words in list from a string?

You can split the sentence and pass it to Collections.counter(). With that you can lookup the counts in your word list. For example:

from collections import Counter

string = "Cup Noodles Chicken Vegetable Noodles"
listWords = ['Noodles', 'Instant', 'Flavour', 'Ramen', 'Chicken', 'Flavor', 'Spicy', 'Beef']

counts = Counter(string.split())
[counts[word] for word in listWords]
# [2, 0, 0, 0, 1, 0, 0, 0]

Without Counter()

You can, of course, do this without Counter(). You just need to handle the KeyError that happens when you try to access a key for the first time. Then you can use get(word, 0) to return a default of 0 when looking up words. Something like:

string = "Cup Noodles Chicken Vegetable Noodles"
listWords = ['Noodles', 'Instant', 'Flavour', 'Ramen', 'Chicken', 'Flavor', 'Spicy', 'Beef']

counts = {}

for word in string.split():
try:
counts[word] += 1
except KeyError:
counts[word] = 1


[counts.get(word, 0) for word in listWords]
# still [2, 0, 0, 0, 1, 0, 0, 0]


Related Topics



Leave a reply



Submit