Count Occurrences of Character Per Line/Field on Unix

Count occurrences of character per line/field on Unix

To count occurrence of a character per line you can do:

awk -F'|' 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"") "\t" NR}' file
count lineNum
4 1
3 2
6 3

To count occurrence of a character per field/column you can do:

column 2:

awk -F'|' -v fld=2 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
1 1
0 2
1 3

column 3:

awk -F'|' -v fld=3 'BEGIN{print "count", "lineNum"}{print gsub(/t/,"",$fld) "\t" NR}' file
count lineNum
2 1
1 2
4 3
  • gsub() function's return value is number of substitution made. So we use that to print the number.
  • NR holds the line number so we use it to print the line number.
  • For printing occurrences of particular field, we create a variable fld and put the field number we wish to extract counts from.

UNIX - Count occurrences of character per line between two fields and add new column with result

You can use awk to check for column, row based data:

awk '{c=0; for(i=7; i<=NF; i++) if ($i==2) c++; if (c<2) c++; print $0, c}' file

ACS_D132 ACS_D132 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D140 ACS_D140 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2
ACS_D141 ACS_D141 0 0 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2
ACS_D147 ACS_D147 0 0 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 2
ACS_D155 ACS_D155 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D196 ACS_D196 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
ACS_D221 ACS_D221 0 0 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Counting number of character occurrences per line

Unfortunately every line in your sample data has six semicolons, which means they should all be printed. However, here is a one-line Perl solution

$ perl -ne'print if tr/;// != 5' aaa.csv
AAAA;BBBB;CCCCCCCC;DD;EEEEEEEE;FF;
AAA1;BBBBB;CCCC;DD;EEEEEEEE;FFFFF;
AAA3;BB;CCCC;DDDDDDDDD;EEEEEEE;FF;

Count occurrences of a char in a string using Bash

I would use the following awk command:

string="text,text,text,text"
char=","
awk -F"${char}" '{print NF-1}' <<< "${string}"

I'm splitting the string by $char and print the number of resulting fields minus 1.

If your shell does not support the <<< operator, use echo:

echo "${string}" | awk -F"${char}" '{print NF-1}'

unix - breakdown of how many lines with number of character occurrences


#!/usr/bin/env perl

use strict; use warnings;

my $seq = shift @ARGV;
die unless defined $seq;

my %freq;

while ( my $line = <> ) {
last unless $line =~ /\S/;
my $occurances = () = $line =~ /(\Q$seq\E)/g;
$freq{ $occurances } += 1;
}

for my $occurances ( sort { $b <=> $a} keys %freq ) {
print "$occurances:\t$freq{$occurances}\n";
}

If you want short, you can always use:

#!/usr/bin/env perl
$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>
;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f;

or, perl -e '$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f' inputfile, but now I am getting silly.

How can I use the UNIX shell to count the number of times a letter appears in a text file?


grep char -o filename | wc -l

Awk: count occurrence of each character for every column and write it in define order

This awk script produces the output that you want:

$ awk 'BEGIN{c["H"];c["G"];c["I"];c["B"];c["b"];c["T"];c["0"]}
{for(i=1;i<=NF;++i)++a[i,$i]}
END{for(i=1;i<=NF;++i){
printf "%s ",i;
for(j in c)printf "%s=%d ",j,a[i,j];print ""}}' file.txt
1 B=0 G=0 T=0 H=0 b=0 I=0 0=5
2 B=1 G=1 T=1 H=1 b=0 I=0 0=1
3 B=1 G=2 T=1 H=0 b=1 I=0 0=0

Initialise the array c in the BEGIN block so that it contains a key for every character. Loop through every field in each line. Increment the value of the array a whose key comprises of the field number and the character in the field. Once every record has been processed, loop through the fields and the keys of the array c, printing the counts in the array a.

The keys in an array are not ordered, so when you use a for x in y loop, you cannot rely on a specific ordering of the output. If you would like to print the keys in a certain order, you would have to specify that yourself. For example, you could do something like this:

$ awk '{for(i=1;i<=NF;++i)++a[i,$i]}
END{for(i=1;i<=NF;++i){
printf "%s ",i
printf "H=%d ", a[i,"H"]
printf "G=%d ", a[i,"G"]
printf "I=%d ", a[i,"I"]
printf "B=%d ", a[i,"B"]
printf "b=%d ", a[i,"b"]
printf "T=%d ", a[i,"T"]
printf "0=%d\n", a[i,"0"]
}}' file.txt

Unix awk - count of occurrences for each unique value

I think you need a better sample input file, but I guess that's what you're looking for

$ awk -F' \\| ' -v OFS=, '{k=substr($3,1,1); ks[k]; c[k,length($3)]++}
END {for(k in ks) print k": "c[k,6],c[k,10],c[k,15]}' file

A: 1,,
B: 1,,
a: 2,,
b: 2,,

note that since all lengths are 6, I printed that count instead of 8. With the right data you should be able to get the output you expect. Note however that the order is not preserved.



Related Topics



Leave a reply



Submit