Count the number of times a word appears in a file
This will also count multiple occurrences of the word in a single line:
grep -o 'word' filename | wc -l
How to count the times a word appears in a file using a shell?
Many times I see people using the following to count words:
$ grep -o 'foo' file.txt | wc -l
Here are a few examples: 1, 2, 3 and even this youtube video.
This really a bad way, for a few reasons:
- It shows you never read
man grep
either BSD grep (NetBSD, OpenBSD, FreeBSD) or GNU grep - All of these implementations offer you the option to count things
-c
.
The NetBSD man page describes this options very clearly:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file. With the -v, --invert-match option (see
below), count non-matching lines.
you can use just one command:
$ grep foo -c file.txt
Not only you could, you should and you'll save yourself lot's time of searching by reading man pages, and understanding the tools you have in hand!
Speed bonus
You can also make your grep
s faster, because pipes are quite expensive.
One the short file shown above a pipe is 2 times slower comparing to using the option -c
:
$ time grep foo -c file.txt
4
real 0m0.001s
user 0m0.000s
sys 0m0.001s
$ time grep -o 'foo' file.txt | wc -l
4
real 0m0.002s
user 0m0.000s
sys 0m0.003s
On large files this can be even more significant. Here I copied my file to a larger time a hundred thousand times:
$ for i in `seq 1 300000`; do cat file.txt >> largefile.txt; done
^C
$ wc -l largefile.txt
1111744 largefile.txt
Now here is how slow is using pipe:
$ time grep -o foo largefile.txt | wc -l
277936
real 0m0.216s
user 0m0.214s
sys 0m0.010s
And here is how fast is only using grep:
$ time grep -c foo largefile.txt
277936
real 0m0.032s
user 0m0.028s
sys 0m0.004s
These benchmarks where done on a machine with Core i5
and plentty of RAM, it would have been significantly on an embeded device with little RAM and CPU resources.
To sum, don't use pipes where you don't need them. Often UNIX tools have overlapping functionalities. Know your tools, read how to use them!
To count the occurence of a word in a file it's enough to use:
$ grep -c <word> <filename>
Python: Count how many times a word occurs in a file
total = 0
with open('input.txt') as f:
for line in f:
found = line.find('California')
if found != -1 and found != 0:
total += 1
print total
output:
3
How to count how many times a word appears in a txt file
int findKey(char *in, char *key, int buf){
int count = 0;
FILE *f;
f = fopen(in,"r");
char temp[buf];
while(fgets(temp,buf,f) != NULL){
char *p = temp;
while((p=(strstr(p,key)))!= NULL){
count++;
++p;
}
}
fclose(f);
return count;
}
How to count the number of times a word sequence appears in a file, using MapReduce in Python?
The mapper is applied on each line, and should count each 3-word sequence, i.e. yield the 3-word sequence along with a count of 1.
The reducer is called with key
and values
, where key
is a 3-word sequence and values
is a list of counts (which would be a list of 1s). The reducer can simply return a tuple of the 3-word sequence and the total number of occurrences, the latter obtained via sum.
class MR3Nums(MRJob):
def mapper(self, _, line):
sequence_length = 3
words = line.strip().split()
for i in range(len(words) - sequence_length + 1):
yield " ".join(words[i:(i+sequence_length)]), 1
def reducer(self, key, values):
yield key, sum(values)
Counting occurrences of a word in a text file without count function
total += word
is adding a string, word
, to total
. You probably are looking for total += 1
, rather than total += word
. You'll then need to add an if
statement to check if the word you're currently examining is equivalent to the target word, like so:
def word_count(target_word):
total = 0
for word in split_poem:
if word == target_word:
total += 1
return total
Count number of times string appears in a file
Your program is not "stuck", it is waiting for you to enter the "user input" (again).
You keep calling returnUserInput()
inside the loop, and it calls getUserInput()
, which will wait for you to type (another) string and press Enter.
As for your code, don't declare line
as a field. It should be a local variable.
For repeatedly searching for text in a line, don't use contains()
or a loop of every character of the line. Use indexOf(String str, int fromIndex)
.
Update
The code should be like this:
String word = returnUserInput();
this.counter = 0;
try (BufferedReader reader = new BufferedReader(new FileReader(fileName))) {
for (String line; (line = reader.readLine()) != null; ) {
for (int i = 0; (i = line.indexOf(word, i)) != -1; i += word.length()) {
this.counter++;
}
}
}
Related Topics
How to Get the Interface Name/Index Associated with a Tcp Socket
How to Change the Mime Type of a File from the Terminal
Microsecond Accurate (Or Better) Process Timing in Linux
What Is Export_Symbol_Gpl in Linux Kernel Code
How to Write One Script That Runs in Bash/Shell and Powershell
Xampp: Another Web Server Daemon Is Already Running
Case-Insensitive Glob on Zsh/Bash
Syntax With Pound and Percent Sign After Shell Parameter Name
Get Link Speed Programmatically
How to Disable Socket Creation for a Linux Process, for Sandboxing
Find All Files Matching 'Name' on Linux System, and Search with Them for 'Text'
Svn: Ignoring an Already Committed File
Implementing Poll in a Linux Kernel Module
Ldd Says "Not Found" Even Though Library Is in My Ld_Library_Path