How to 'Uniq' by Column

Is there a way to 'uniq' by column?

sort -u -t, -k1,1 file
  • -u for unique
  • -t, so comma is the delimiter
  • -k1,1 for the key field 1

Test result:

overflow@domain2.example,2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0
stack2@domain.example,2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1

use uniq -d on a particular column?

Here's one way using awk. It reads the input file twice, but avoids the need to sort:

awk -F, 'FNR==NR { a[$2]++; next } a[$2] > 1' file file

Results:

john,3
tom,3
junior,5
tony,5

Brief explanation:

FNR==NR is a common AWK idiom that is true for the first file in the arguments list. Here, column two is added to an array and incremented. On the second read of the file, we simply check if the value of column two is greater than one (the next keyword skips processing the rest of the code).

Sorting unique by column - sort command?

This might work for you:

sort -uk1,1 file

This sorts the file on the first field only and removes duplicate lines based on the first field.

how to sort based on a column but uniq based on another column?

You can use pipe, however it's not in place.

Example :

$ cat initial.txt
1,3,4
2,3,1
1,2,3
2,3,4
1,4,1
3,1,3
4,2,4

$ cat initial.txt | sort -u -t, -k1,1 | sort -t, -k2,2
3,1,3
4,2,4
1,3,4
2,3,1

Result is sorted by key 2, unique by key 1. note that result is displayed on the console, if you want it in a file, just use a redirect (> newFiletxt)

Other solution for this kind of more complex operation is to rely on another tool (depending on your preferences (and age), awk, perl or python)

EDIT:
If i understood correctly the new requirement, it's sorted by colum 2, column 1 is unique for a given column 2:

$ cat initial.txt | sort -u -t, -k1,2 | sort -t, -k2,2
3,1,3
1,2,3
4,2,4
1,3,4
2,3,1
1,4,1

Is it what you expect ? Otherwise, I did not understand :-)

Change the order of columns in the output of `uniq -c`

Append to your command:

| awk '{print $2,$1}'

Bash- is it possible to use -uniq for only one column of a line?

Try this:

sort -rnk3 myfile | awk -F"[. ]" '!a[$2]++'

awk removes the duplicates depending on the 2nd column. This is actually a famous awk syntax to remove duplicates. An array is maintained where the record of 2nd field is maintained. Every time before a record is printed, the 2nd field is checked in the array. If not present, it is printed, else its discarded since it is duplicate. This is achived using the ++. First time, when a record is encountered, this ++ will keep the count as 0 since its post-fix. SUbsequent occurences will increase the value which when negated becomes false.

awk 'uniq' on a range of columns

you can use sort:

history | sort -u -k4
  • -u for unique
  • -k4 to sort only on all columns starting the fourth.

Running this on

 1102  2017-10-27 09:05:07 cd /tmp/
1109 2017-10-27 09:07:03 cd /tmp/
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1124 2017-11-07 16:38:50 cd /etc/init.d/
1127 2017-12-29 11:13:26 cd /tmp/
1144 2018-06-21 13:04:26 cd /etc/init.d/
1161 2018-06-28 09:53:21 cd /etc/init.d/
1169 2018-07-09 16:33:52 cd /var/log/
1179 2018-07-10 15:54:32 cd /etc/init.d/

yields:

 1124  2017-11-07 16:38:50 cd /etc/init.d/                                                                                                                                                                         
1112 2017-10-27 09:07:15 cd nagent-rhel_64/
1102 2017-10-27 09:05:07 cd /tmp/
1169 2018-07-09 16:33:52 cd /var/log/

EDIT if you want to keep the order you might apply a second sort:

history | sort -u -k4 | sort -n

Uniq a column and print out number of rows in that column

Well, you changed the functionality since the OP, but this should get you unique names in your file (considering it's named data), unsorted:

#!/bin/bash
sed "1 d" data | awk -F"," '!_[$1]++ { print $1 }'

If you need to sort, append | sort to the command line above.

And append | wc -l to the command line to count lines.

Sort by column and uniq by another column

In GNU awk:

$ gawk '
BEGIN { FS=OFS=";" }
# { k=$2 FS $3 FS $4 FS $6 }
$7>t[$6] { # $7>t[k] {
t[$6]=$7 # t[k]=$7
r[$6]=$0 # r[k]=$0
}
END {
PROCINFO["sorted_in"]="@val_num_desc"
for(i in t)
print r[i]
}' file
2013;D;153;RIZE;;LT2;88
1999;D;153;RIZE;;LT1;86
2011;D;153;RIZE;;LT4;81
2008;D;153;RIZE;;LT3;77

If you don't have GNU awk available, sort output with sort:

$ awk '
BEGIN { FS=OFS=";" }
$7>t[$6] {
t[$6]=$7
r[$6]=$0
}
END {
for(i in t)
print r[i]
}' file |
sort -s -t \; -k7nr
2013;D;153;RIZE;;LT2;88
1999;D;153;RIZE;;LT1;86
2011;D;153;RIZE;;LT4;81
2008;D;153;RIZE;;LT3;77


Related Topics



Leave a reply



Submit