Linux command or script counting duplicated lines in a text file?
Send it through sort
(to put adjacent items together) then uniq -c
to give counts, i.e.:
sort filename | uniq -c
and to get that list in sorted order (by frequency) you can
sort filename | uniq -c | sort -nr
Find duplicate lines in a file and count how many time each line was duplicated?
Assuming there is one number per line:
sort <file> | uniq -c
You can use the more verbose --count
flag too with the GNU version, e.g., on Linux:
sort <file> | uniq --count
Linux command or script counting duplicated bunch of lines in a text file?
awk -v RS=Separator '
NR>1 {count[$0]++}
END {for (bunch in count) print count[bunch], RS, bunch}
' file
1 Separator
line31
line32
line33
2 Separator
line21
line22
line23
3 Separator
line11
line12
line13
There is no inherent order to the output. If you want sorted by count descending, and you're using GNU AWK:
awk -v RS=Separator '
NR>1 {count[$0]++}
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (bunch in count) print count[bunch], RS, bunch
}
' file
How to count the amount of unique lines, duplicate lines and lines that appear three times in a text file
$ echo 'Donald
Donald
Lisa
John
Lisa
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
1 1
1 2
1 3
The right column is the repetition count, and the left column is the number of unique names with that repetition count. E.g. “Donald” has a repetition count of 3.
Bigger example:
echo 'Donald
Donald
Rob
Lisa
WhatAmIDoing
John
Obama
Obama
Lisa
Washington
Donald' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
4 1
2 2
1 3
Four names (“Rob”, “WhatAmIDoing”, “John”, and “Washington”) each have a repetition count of 1. Two names (“Lisa” and “Obama”) each have a repetition count of 2. One name (“Donald”) has a repetition count of 3.
Bash Script: count unique lines in file
You can use the uniq
command to get counts of sorted repeated lines:
sort ips.txt | uniq -c
To get the most frequent results at top (thanks to Peter Jaric):
sort ips.txt | uniq -c | sort -bgr
How to find duplicate lines in a file?
use this:
sort filename | uniq -d
man uniq
Count duplicates from several files
you can use one of these;
awk '{count[$0]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file1 file2 file3 file4 file5
or
awk 'seen[$0]++ == 1' file1 file2 file3 file4 file5
you could test this for a=3 and b=4.
awk '{count[$0]++} END {for (line in count) if ( count[line] == 3 && line == "a" || count[line] == 4 && line == "b" ) {print line} }' file1 file2 file3 file4 file5
test:
$ awk '{count[$0]++}END{for (a in count) {if (count[a] > 1 ) {print a}}}' file1 file2 file3 file4 file5
a
b
$ awk 'seen[$0]++ == 1' file1 file2 file3 file4 file5
a
b
$ awk '{count[$0]++} END {for (line in count) if ( count[line] == 2 && line == "a" || count[line] == 3 && line == "b" ) {print line, count[line]} }' 1 2 3 4 5
a 2
b 3
Search duplicate lines in a file, count it and which location (line number) without sorting it?
You may use awk
:
awk '{if ($1 in a) printf "( %d dupe of %d ): %s\n", NR, a[$1], $1; else a[$1] = NR}' file
( 2 dupe of 1 ): 123
( 4 dupe of 3 ): 234
( 5 dupe of 1 ): 123
Related Topics
Matlab Mex Socket Wrapper Library
Why Do My Results Different Following Along the Tiny Asm Example
How to Get Out of 'Screen' Without Typing 'Exit'
Compile/Run Assembler in Linux
PDF Compare on Linux Command Line
How to Get the Nvidia Driver Version from the Command Line
How to Read Ring Buffer Within Linux Kernel Space
Recursively Find All Files Newer Than a Given Time
How to Parse Netstat Command in Order to Get Process Name and Pid from It
How to Divide in the Linux Console
Dyld_Library_Path Environment Variable Is Not Forwarded to External Command in Makefile on MACos
Prevent * to Be Expanded in the Bash Script
How to Make Grep Print the Lines Below and Above Each Matching Line
When to Use Pipes VS When to Use Shared Memory
Number of Processors/Cores in Command Line
How to Display Modified Date Time with 'Find' Command
Getting Message "Sudo: Must Be Setuid Root", But Sudo Is Already Owned by Root