Word Frequency Count Java 8

Word frequency count Java 8

I want to share the solution I found because at first I expected to use map-and-reduce methods, but it was a bit different.

Map<String,Long> collect = wordsList.stream()
.collect( Collectors.groupingBy( Function.identity(), Collectors.counting() ));

Or for Integer values:

Map<String,Integer> collect = wordsList.stream()
.collect( Collectors.groupingBy( Function.identity(), Collectors.summingInt(e -> 1) ));

EDIT

I add how to sort the map by value:

LinkedHashMap<String, Long> countByWordSorted = collect.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.collect(Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue,
(v1, v2) -> {
throw new IllegalStateException();
},
LinkedHashMap::new
));

Counting word occurence with arrays stream in Java 8

    String sentence = "The cat has black fur and black eyes";
String[] bites = sentence.trim().split("\\s+");
String in = "black cat";
long i = Stream.of(bites).filter(e->(Arrays.asList(in.split("\\s")).contains(e))).count();
System.out.println(i);

Count frequency of each word from list of Strings using Java8

You need to extract all the words from listB and keep only these that are also listed in listA. Then you simply collect the pairs word -> count to the Map<String, Long>:

String[] listA={"the", "you", "how"};
String[] listB = {"the dog ate the food", "how is the weather" , "how are you"};

Set<String> qualified = new HashSet<>(Arrays.asList(listA)); // make searching easier

Map<String, Long> map = Arrays.stream(listB) // stream the sentences
.map(sentence -> sentence.split("\\s+")) // split by words to Stream<String[]>
.flatMap(words -> Arrays.stream(words) // flatmap to Stream<String>
.distinct()) // ... as distinct words by sentence
.filter(qualified::contains) // keep only the qualified words
.collect(Collectors.groupingBy( // collect to the Map
Function.identity(), // ... the key is the words itself
Collectors.counting())); // ... the value is its frequency

Output:

{the=2, how=2, you=1}

Calculating frequency of each word in a sentence in java

Use a map with word as a key and count as value, somthing like this

    Map<String, Integer> map = new HashMap<>();
for (String w : words) {
Integer n = map.get(w);
n = (n == null) ? 1 : ++n;
map.put(w, n);
}

if you are not allowed to use java.util then you can sort arr using some sorting algoritm and do this

    String[] words = new String[arr.length];
int[] counts = new int[arr.length];
words[0] = words[0];
counts[0] = 1;
for (int i = 1, j = 0; i < arr.length; i++) {
if (words[j].equals(arr[i])) {
counts[j]++;
} else {
j++;
words[j] = arr[i];
counts[j] = 1;
}
}

An interesting solution with ConcurrentHashMap since Java 8

    ConcurrentMap<String, Integer> m = new ConcurrentHashMap<>();
m.compute("x", (k, v) -> v == null ? 1 : v + 1);

Word count with java 8

The problem seems to be that you are in fact splitting by words, i.e. you are streaming over everything that is not a word, or that is in between words. Unfortunately, there seems to be no equivalent method for streaming the actual match results (hard to believe, but I did not find any; feel free to comment if you know one).

Instead, you could just split by non-words, using \W instead of \w. Also, as noted in comments, you can make it a bit more readable by using String::toLowerCase instead of a lambda and Collectors.summingInt.

public static Map<String, Integer> countJava8(String input) {
return Pattern.compile("\\W+")
.splitAsStream(input)
.collect(Collectors.groupingBy(String::toLowerCase,
Collectors.summingInt(s -> 1)));
}

But IMHO this is still very hard to comprehend, not only because of the "inverse" lookup, and it's also difficult to generalize to other, more complex patterns. Personally, I would just go with the "old school" solution, maybe making it a bit more compact using the new getOrDefault.

public static Map<String, Integer> countOldschool(String input) {
Map<String, Integer> wordcount = new HashMap<>();
Matcher matcher = Pattern.compile("\\w+").matcher(input);
while (matcher.find()) {
String word = matcher.group().toLowerCase();
wordcount.put(word, wordcount.getOrDefault(word, 0) + 1);
}
return wordcount;
}

The result seems to be the same in both cases.

Word frequency count in 2 files

Firstly, instead of using an array for unique keys, use a HashMap<String, Integer>. It's a lot more efficient.

Your best option is to run your processing over each line/file separately, and store these counts separately. Then merge the two counts to get the overall frequencies.

More Detail:

String[] keys = text.split("[!.?:;\\s]");
HashMap<String,Integer> uniqueKeys = new HashMap<>();

for(String key : keys){
if(uniqueKeys.containsKey(key)){
// if your keys is already in map, increment count of it
uniqueKeys.put(key, uniqueKeys.get(map) + 1);
}else{
// if it isn't in it, add it
uniqueKeys.put(key, 1);
}
}

// You now have the count of all unique keys in a given text
// To print them to console

for(Entry<String, Integer> keyCount : uniqueKeys.getEntrySet()){
System.out.println(keyCount.getKey() + ": " + keyCount.getValue());
}

// To merge, if you're using Java 8

for(Entry<String, Integer> keyEntry : uniqueKeys1.getEntrySet()){
uniqueKeys2.merge(keyEntry.getKey(), keyEntry.getValue(), Integer::add);
}

// To merge, otherwise

for(Entry<String, Integer> keyEntry : uniqueKeys1.getEntrySet()){
if(uniqueKeys2.containsKey()){
uniqueKeys2.put(keyEntry.getKey(),
uniqueKeys2.get(keyEntry.getKey()) + keyEntry.getValue());
}else{
uniqueKeys2.put(keyEntry.getKey(), keyEntry.getValue());
}
}

Java 8 - Count of words and then arrange in desc order

The most difficult part is the sorting. Since you want to keep only the 7 first elements from the result and you want to sort the Map by its values, we need to create a Map of all the result, sort it and then keep the 7 results.

In the following code, every word is lower-cased and grouped by themselves, counting the number of occurences. Then, we need to sort this map so we create a Stream over the entries, sort them according to the values (in descending order) and then according to the keys. The 7 first elements are retained, mapped to their key (which correspond to the word) and collected into a List, thus keeping encounter order.

public static void main(String[] args) {
String sentence = "Hello alan i am here where are you and what are you doing hello are you there";
List<String> words = Arrays.asList(sentence.split(" "));

List<String> result =
words.stream()
.map(String::toLowerCase)
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.sorted(Map.Entry.<String, Long> comparingByValue(reverseOrder()).thenComparing(Map.Entry.comparingByKey()))
.limit(7)
.map(Map.Entry::getKey)
.collect(toList());

System.out.println(result);
}

Output:

[are, you, hello, alan, am, and, doing]

Note that you made a mistake in your wanted output: "are" actually appears 3 times like "you" so it should be before

NB: this code assumes a lot of static imports, namely:

import static java.util.Comparator.reverseOrder;
import static java.util.function.Function.identity;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.toList;


Related Topics



Leave a reply



Submit