Finding Repeated Words on a String and Counting the Repetitions

Finding repeated words on a string and counting the repetitions

You've got the hard work done. Now you can just use a Map to count the occurrences:

Map<String, Integer> occurrences = new HashMap<String, Integer>();

for ( String word : splitWords ) {
Integer oldCount = occurrences.get(word);
if ( oldCount == null ) {
oldCount = 0;
}
occurrences.put(word, oldCount + 1);
}

Using map.get(word) will tell you many times a word occurred. You can construct a new list by iterating through map.keySet():

for ( String word : occurrences.keySet() ) {
//do something with word
}

Note that the order of what you get out of keySet is arbitrary. If you need the words to be sorted by when they first appear in your input String, you should use a LinkedHashMap instead.

Check the repeated words in a string and keep a count in javascript

Firstly convert the given string to an array. To do that use string.split("").

Secondly, create an map which will store word as key and count as value.

Now iterate through the stringArray and store the current word to the map. And increase the count for the word each time the word is found.

Check the below code.

let words = "I am not gonna live forever, but I wanna live while I am alive";

function countRepeatedWords(sentence) {
let words = sentence.split(" ");
let wordMap = {};

for (let i = 0; i < words.length; i++) {
let currentWordCount = wordMap[words[i]];
let count = currentWordCount ? currentWordCount : 0;
wordMap[words[i]] = count + 1;
}
return wordMap;
}

console.log(countRepeatedWords(words));

I hope this helps you.

Skip duplicate words in counting a string

It would be better to use a single pass over the terms with a Counter first:

>>> from collections import Counter
>>> counter = Counter("Python is great but Java is also great".split())
>>> for word, count in counter.items():
... print(word, count)
Python 1
is 2
great 2
but 1
Java 1
also 1

Order will be preserved since a Counter is a dict, and dict is order preserving.

The reason this is better is that using s.count(word) for each word is looking like O(n^2) complexity, which is not good.

Find repeated words in a string separated by "/"

I think the following could work for you. First, fixed = TRUE in strsplit() bypasses the regex engine and goes straight to exact matching, making the function much faster. Next, anyDuplicated() returns a length one integer result which will be zero if no duplicates are found, and greater than zero otherwise. So we can split the string with strsplit() and iterate anyDuplicated() over the result. Then we can compare the resulting vector with zero.

vapply(strsplit(x, "/", fixed = TRUE), anyDuplicated, 1L) > 0L
# [1] TRUE FALSE

To be safe, you may want to remove any leading /, since it will produce an empty character in the result from strsplit() and could produce misleading results in some cases (e.g. cases where the string begins with a / and irs//irs or similar occurs later in the string). You can remove leading forward slashes with sub("^/", "", x).

In summary, the ways to make your strsplit() idea faster are:

  • use fixed = TRUE in strsplit() to bypass the regex engine
  • use anyDuplicated() since it stops looking after it finds one match
  • use vapply() since we know what the result type and length will be

Find the most repeated word in a string

Split String and save to array, sort the array, iterate over the sorted array and count frequency of same strings updating the maximal count. Example:

public static void main(String[] args) {
String myStr = "how can I find the most frequent word in an string how can I find how how how string";
String[] splited = myStr.split(" ");
Arrays.sort(splited);
System.out.println(Arrays.toString(splited));
int max = 0;
int count= 1;
String word = splited[0];
String curr = splited[0];
for(int i = 1; i<splited.length; i++){
if(splited[i].equals(curr)){
count++;
}
else{
count =1;
curr = splited[i];
}
if(max<count){
max = count;
word = splited[i];
}
}
System.out.println(max + " x " + word);
}

find the duplicate word from a sentence with count using for loop

You need to print the result only for the outer loop. Also, you need to avoid checking the words that were already checked in previous iteration:

for (int i = 0; i < word.length; i++) {
int count = 0; // reset the counter for each word

for (int j = 0; j < word.length; j++) {

if (word[i].equals(word[j])) {
/* if the words are the same, but j < i, it was already calculated
and printed earlier, so we can stop checking the current word
and move on to another one */
if (j < i) {
break; // exit the inner loop, continue with the outer one
}

count++;
}
}

if (count > 1) {
System.out.println("the word " + word[i] + " occured " + count + " time");
}
}

UPDATE

Additional explanation around this code: if (j < i) { break; }

i is the index of the word we calculate duplicates for, j is the word we compare it against. Since we start always from beginning, we know that if the words are equal while j < i, it was already processed in earlier run of the outer loop.

In this case, using break, we interrupt the inner loop and the flow continues in the outer loop. As we didn't update count at all, it is still zero and thus the condition for printing the result if (count > 1) is not satisfied and the println is not executed.

Example for the word "hello", using simple pseudo-code in the following part.

For its first occurrence:

count = 0
i = 1, j = 0 --> hello != hi --> do nothing
i = 1, j = 1 --> hello == hello, j is not < i --> count++
i = 1, j = 2 --> hello != hi --> do nothing
i = 1, j = 3 --> hello != good --> do nothing
i = 1, j = 4 --> hello != morning --> do nothing
i = 1, j = 5 --> hello == hello, j is not < i --> count++
count > 1 --> print the result

For its second occurrence:

count = 0
i = 5, j = 0 --> hello != hi --> do nothing
i = 5, j = 1 --> hello == hello, j < i --> break, we have seen this pair earlier
count is not > 1 --> result not printed

Hope I didn't make things more complicated with this example



Related Topics



Leave a reply



Submit