Count the Number of Times a Word Appears in a File

Count the number of times a word appears in a file

This will also count multiple occurrences of the word in a single line:

grep -o 'word' filename | wc -l

How to count the times a word appears in a file using a shell?

Many times I see people using the following to count words:

$ grep -o 'foo' file.txt | wc -l

Here are a few examples: 1, 2, 3 and even this youtube video.

This really a bad way, for a few reasons:

  1. It shows you never read man grep either BSD grep (NetBSD, OpenBSD, FreeBSD) or GNU grep
  2. All of these implementations offer you the option to count things -c.
    The NetBSD man page describes this options very clearly:
   -c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines.

you can use just one command:

 $ grep foo -c file.txt 

Not only you could, you should and you'll save yourself lot's time of searching by reading man pages, and understanding the tools you have in hand!

Speed bonus
You can also make your greps faster, because pipes are quite expensive.
One the short file shown above a pipe is 2 times slower comparing to using the option -c:

$ time grep foo -c file.txt 
4

real 0m0.001s
user 0m0.000s
sys 0m0.001s
$ time grep -o 'foo' file.txt | wc -l
4

real 0m0.002s
user 0m0.000s
sys 0m0.003s

On large files this can be even more significant. Here I copied my file to a larger time a hundred thousand times:

$ for i in `seq 1 300000`; do cat file.txt >> largefile.txt; done
^C
$ wc -l largefile.txt
1111744 largefile.txt

Now here is how slow is using pipe:

$ time grep -o foo largefile.txt | wc -l
277936

real 0m0.216s
user 0m0.214s
sys 0m0.010s

And here is how fast is only using grep:

 $ time grep -c foo largefile.txt 
277936

real 0m0.032s
user 0m0.028s
sys 0m0.004s

These benchmarks where done on a machine with Core i5 and plentty of RAM, it would have been significantly on an embeded device with little RAM and CPU resources.

To sum, don't use pipes where you don't need them. Often UNIX tools have overlapping functionalities. Know your tools, read how to use them!

To count the occurence of a word in a file it's enough to use:

$ grep -c <word> <filename>

Python: Count how many times a word occurs in a file


total = 0

with open('input.txt') as f:
for line in f:
found = line.find('California')
if found != -1 and found != 0:
total += 1

print total

output:

3

How to count how many times a word appears in a txt file


int findKey(char *in, char *key, int buf){
int count = 0;
FILE *f;
f = fopen(in,"r");
char temp[buf];
while(fgets(temp,buf,f) != NULL){
char *p = temp;
while((p=(strstr(p,key)))!= NULL){
count++;
++p;
}
}
fclose(f);
return count;
}

How to count the number of times a word sequence appears in a file, using MapReduce in Python?

The mapper is applied on each line, and should count each 3-word sequence, i.e. yield the 3-word sequence along with a count of 1.

The reducer is called with key and values, where key is a 3-word sequence and values is a list of counts (which would be a list of 1s). The reducer can simply return a tuple of the 3-word sequence and the total number of occurrences, the latter obtained via sum.

class MR3Nums(MRJob):

def mapper(self, _, line):
sequence_length = 3
words = line.strip().split()
for i in range(len(words) - sequence_length + 1):
yield " ".join(words[i:(i+sequence_length)]), 1

def reducer(self, key, values):
yield key, sum(values)

Counting occurrences of a word in a text file without count function


total += word

is adding a string, word, to total. You probably are looking for total += 1, rather than total += word. You'll then need to add an if statement to check if the word you're currently examining is equivalent to the target word, like so:

def word_count(target_word):
total = 0
for word in split_poem:
if word == target_word:
total += 1
return total

Count number of times string appears in a file

Your program is not "stuck", it is waiting for you to enter the "user input" (again).

You keep calling returnUserInput() inside the loop, and it calls getUserInput(), which will wait for you to type (another) string and press Enter.


As for your code, don't declare line as a field. It should be a local variable.

For repeatedly searching for text in a line, don't use contains() or a loop of every character of the line. Use indexOf(String str, int fromIndex).

Update

The code should be like this:

String word = returnUserInput();
this.counter = 0;
try (BufferedReader reader = new BufferedReader(new FileReader(fileName))) {
for (String line; (line = reader.readLine()) != null; ) {
for (int i = 0; (i = line.indexOf(word, i)) != -1; i += word.length()) {
this.counter++;
}
}
}


Related Topics



Leave a reply



Submit