Count occurrences of a list of words in a text file
You can use fgrep
to do this efficiently:
fgrep -of f1.txt f2.txt | sort | uniq -c | awk '{print $2 " " $1}'
Gives this output:
apple 3
cat 1
dog 2
fgrep -of f1.txt f2.txt
extracts all the matching parts (-o
option) of f2.txt based on the patterns in f1.txtsort | uniq -c
counts the matching patterns- finally,
awk
swaps the order of words inuniq -c
output
Calculate Word occurrences from file in bash
Well, I'm not sure that I've got the point of the thing you are trying to do,
but I would do it this way:
while read file
do
cat $file | tr -cs A-Za-z\' '\n'| tr A-Z a-z | sort | uniq -c > stat.$file
done < file-list
Now you have statistics for all your file, and now you simple aggregate it:
while read file
do
cat stat.$file
done < file-list \
| sort -k2 \
| awk '{if ($2!=prev) {print s" "prev; s=0;}s+=$1;prev=$2;}END{print s" "prev;}'
Example of usage:
$ for i in ls bash cp; do man $i > $i.txt ; done
$ cat <<EOF > file-list
> ls.txt
> bash.txt
> cp.txt
> EOF
$ while read file; do
> cat $file | tr -cs A-Za-z\' '\n'| tr A-Z a-z | sort | uniq -c > stat.$file
> done < file-list
$ while read file
> do
> cat stat.$file
> done < file-list \
> | sort -k2 \
> | awk '{if ($2!=prev) {print s" "prev; s=0;}s+=$1;prev=$2;}END{print s" "prev;}' | sort -rn | head
3875 the
1671 is
1137 to
1118 a
1072 of
793 if
744 and
533 command
514 in
507 shell
Shell Script to Count the Occurrence of a Word in a file
Using tr for separating words and then grep and wc seems possible :
tr -s ' ' '\n' < file.txt | grep file | wc -l
how to count occurrence of specific word in group of file by bash/shellscript
This alternative requires no pipelines:
$ awk -v RS='[[:space:]]+' '/^h/{i++} END{print i+0}' simple.txt simple1.txt
7
How it works
-v RS='[[:space:]]+'
This tells awk to treat each word as a record.
/^h/{i++}
For any record (word) that starts with
h
, we increment variablei
by 1.END{print i+0}
After we have finished reading all the files, we print out the value of
i
.
How to create a frequency list of every word in a file?
Not sed
and grep
, but tr
, sort
, uniq
, and awk
:
% (tr ' ' '\n' | sort | uniq -c | awk '{print $2"@"$1}') <<EOF
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.
EOF
a@1
appear@2
file@1
is@1
many@1
more@1
of@2
once.@1
one@1
only@1
Some@2
than@1
the@2
This@1
time.@1
with@1
words@2
words.@1
In most cases you also want to remove numbers and punctuation, convert everything to lowercase (otherwise "THE", "The" and "the" are counted separately) and suppress an entry for a zero length word. For ASCII text you can do all these with this modified command:
sed -e 's/[^A-Za-z]/ /g' text.txt | tr 'A-Z' 'a-z' | tr ' ' '\n' | grep -v '^$'| sort | uniq -c | sort -rn
How can I count the occurrences of a string within a file?
This will output the number of lines that contain your search string.
grep -c "echo" FILE
This won't, however, count the number of occurrences in the file (ie, if you have echo multiple times on one line).
edit:
After playing around a bit, you could get the number of occurrences using this dirty little bit of code:
sed 's/echo/echo\n/g' FILE | grep -c "echo"
This basically adds a newline following every instance of echo so they're each on their own line, allowing grep to count those lines. You can refine the regex if you only want the word "echo", as opposed to "echoing", for example.
Print every word and its number of occurrences, using pure `bash`
You could use an associative array for counting the words, a bit like this:
$ cat foo.sh
#!/bin/bash
declare -A words
while read line
do
for word in $line
do
((words[$word]++))
done
done
for i in "${!words[@]}"
do
echo "$i:" "${words[$i]}"
done
Testing it:
$ echo this is a test is this | bash foo.sh
is: 2
this: 2
a: 1
test: 1
This answer was constructed pretty much from these fine answers: this and this. Don't forget to upvote them.
Count occurrence of list of words in multiple files
Take the output from you script and pipe it to
awk '{ arry[$1]+=$2 } END { for (i in arry) { print i" "arry[i] } }'
Related Topics
Extract Date from a File Name in Unix Using Shell Scripting
Redirect Process Stdin and Stdout to Netcat
Shell Script Password Security of Command-Line Parameters
How to Run an Opengl Application Installed on a Linux MAChine from My Windows MAChine
Where Do You Download Linux Source Code
Linux Assembly: How to Call Syscall
Calculate Word Occurrences from File in Bash
Comparison of Integer and Floating Point Numbers in Shell Script
Stack Resident Buffer Overflow on 64-Bit
Cmake Include_Directories Order After/Before
Codeigniter Url Rewriting .Htaccess Is Not Working on Centos
Critical Timing in an Arm Linux Kernel Driver
Accessing Linux /Dev/Usb as Standard Files to Communicate with Usb Device
How to Get Original Destination Port of Redirected Udp Message
How to List the Files in a Zip Archive Without Decompressing It