Linux Command or Script Counting Duplicated Lines in a Text File

Linux command or script counting duplicated lines in a text file?

Send it through sort (to put adjacent items together) then uniq -c to give counts, i.e.:

sort filename | uniq -c

and to get that list in sorted order (by frequency) you can

sort filename | uniq -c | sort -nr

Find duplicate lines in a file and count how many time each line was duplicated?

Assuming there is one number per line:

sort <file> | uniq -c

You can use the more verbose --count flag too with the GNU version, e.g., on Linux:

sort <file> | uniq --count

Linux command or script counting duplicated bunch of lines in a text file?


awk -v RS=Separator '
NR>1 {count[$0]++}
END {for (bunch in count) print count[bunch], RS, bunch}
' file
1 Separator 
line31
line32
line33

2 Separator
line21
line22
line23

3 Separator
line11
line12
line13

There is no inherent order to the output. If you want sorted by count descending, and you're using GNU AWK:

awk -v RS=Separator '
NR>1 {count[$0]++}
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (bunch in count) print count[bunch], RS, bunch
}
' file

How to count the amount of unique lines, duplicate lines and lines that appear three times in a text file


$ echo 'Donald
Donald
Lisa
John
Lisa
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
1 1
1 2
1 3

The right column is the repetition count, and the left column is the number of unique names with that repetition count. E.g. “Donald” has a repetition count of 3.

Bigger example:

echo 'Donald
Donald
Rob
Lisa
WhatAmIDoing
John
Obama
Obama
Lisa
Washington
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
4 1
2 2
1 3

Four names (“Rob”, “WhatAmIDoing”, “John”, and “Washington”) each have a repetition count of 1. Two names (“Lisa” and “Obama”) each have a repetition count of 2. One name (“Donald”) has a repetition count of 3.

Bash Script: count unique lines in file

You can use the uniq command to get counts of sorted repeated lines:

sort ips.txt | uniq -c

To get the most frequent results at top (thanks to Peter Jaric):

sort ips.txt | uniq -c | sort -bgr

How to find duplicate lines in a file?

use this:

sort filename | uniq -d
man uniq

Count duplicates from several files

you can use one of these;

awk '{count[$0]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file1 file2 file3 file4 file5

or

awk 'seen[$0]++ == 1' file1 file2 file3 file4 file5

you could test this for a=3 and b=4.

awk '{count[$0]++} END {for (line in count) if ( count[line] == 3 && line == "a" || count[line] == 4 && line == "b" ) {print line} }' file1 file2 file3 file4 file5

test:

$ awk '{count[$0]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file1 file2 file3 file4 file5
a
b


$ awk 'seen[$0]++ == 1' file1 file2 file3 file4 file5
a
b

$ awk '{count[$0]++} END {for (line in count) if ( count[line] == 2 && line == "a" || count[line] == 3 && line == "b" ) {print line, count[line]} }' 1 2 3 4 5
a 2
b 3

Search duplicate lines in a file, count it and which location (line number) without sorting it?

You may use awk:

awk '{if ($1 in a) printf "( %d dupe of %d ): %s\n", NR, a[$1], $1; else a[$1] = NR}' file

( 2 dupe of 1 ): 123
( 4 dupe of 3 ): 234
( 5 dupe of 1 ): 123


Related Topics



Leave a reply



Submit