How to Catch Duplicate Entries in Text File in Linux

How to catch duplicate entries in text file in linux

Your question is not quite clear, but you can filter out duplicate lines with uniq:

sort file.txt | uniq

or simply

sort -u file.txt

(thanks RobEarl)

You can also print only repeating lines with

sort file.txt | uniq -d

Find duplicate entries in a text file using shell

awk shines for these kind of tasks but here in a non awk solution,

$ sed 's|.*/|& |' file | sort -k2 -u | sed 's|/ |/|'

/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

or, if your path is balanced (the same number of parents for all files)

$ sort -t/ -k5 -u file

/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

Find duplicate lines in a file and count how many time each line was duplicated?

Assuming there is one number per line:

sort <file> | uniq -c

You can use the more verbose --count flag too with the GNU version, e.g., on Linux:

sort <file> | uniq --count

How to find duplicate lines in a file?

use this:

sort filename | uniq -d
man uniq

detecting duplicate entries in a tab separated file using bash & commands

This is pretty easy with awk:

$ awk 'BEGIN { FS = "\t" }
($3,$6) in seen { printf("Line %d is a duplicate of line %d\n", NR, seen[$3,$6]); next }
{ seen[$3,$6] = NR }' input.tsv

It saves each bookid, authorid pair in a hash table and warns if that pair already exists.

Print duplicate entries in a file using linux commands

Awk is your friend:

sort foo.txt | uniq --count --repeated | awk '{print($2" appears "$1" times")}'

Find duplicate records in file

With GNU awk you can do:

$ awk -F'[@,]' '{a[$2]++}END{for(k in a) print a[k],k}' file
1 domainz.com
2 domainx.com
1 domainy.de

You can use sort to order the output i.e. ascending numerical with -n:

$ awk -F'[@,]' '{a[$2]++}END{for(k in a) print a[k],k}' file | sort -n 
1 domainy.de
1 domainz.com
2 domainx.com

Or just to print duplicate domains:

$ awk -F'[@,]' '{a[$2]++}END{for(k in a)if (a[k]>1) print k}' file
domainx.com

How to get unique values (lines) in a txt file that have duplicated lines using unix?

Use just uniq

$ cat file
aaaaaa
bbbbbb
cccccc
ababab
ababab
ababab
ababab

$ cat file | uniq
aaaaaa
bbbbbb
cccccc
ababab

$ sort file | uniq -u
aaaaaa
bbbbbb
cccccc

How to print full name of the duplicate values from a text file?


ls -1 *.ts | sort -V | awk -F[_.] '
{
map[$5]+=1;
map1[$5][$0]
}
END {
for (i in map)
{
if(map[i]>1)
{
for (j in map1[i])
{
print "DUPLICATE---:> "j
}
}
}
}' | sort

One liner

ls -1 *.ts | sort -V | awk -F[_.] '{ map[$5]+=1;map1[$5][$0] } END { for (i in map) { if(map[i]>1) { for (j in map1[i]) { print "DUPLICATE---:> "j } } } }' | sort

Using awk, set the field seperator to _ or . Then create two arrays. The first (map) holds a count for each number in the file path. The second (map1) is a multi dimensional array with the first index as the number and the second as the complete line (file path). We then loop through the array map at the end and check for any counts that are greater than one. If we find any, we loop through the second map1 array and print the lines (second index) along with the additional text. We finally run through sort again to get the ordering as required,.



Related Topics



Leave a reply



Submit