Sorting with Multiple Keys with Linux Sort Command

Sorting multiple keys with Unix sort

Use the -k option (or --key=POS1[,POS2]). It can appear multiple times and each key can have global options (such as n for numeric sort)

sorting with multiple keys with Linux sort command

I find this caution in the GNU sort docs.

Sort numerically on the second field and resolve ties by sorting
alphabetically on the third and fourth characters of field five. Use
‘:’ as the field delimiter.

      sort -t : -k 2,2n -k 5.3,5.4

Note that if you had written -k 2n instead of -k 2,2n sort would have
used all characters beginning in the second field and extending to the
end of the line as the primary numeric key. For the large majority of
applications, treating keys spanning more than one field as numeric
will not do what you expect.

I'm not sure what it ends up with when it evaluates '1001 3' as a numeric key, but "will not do what you expect" is accurate. It seems clear that the Right Thing to do is to specify each key independently.

The same web page says this about resolving "ties".

Finally, as a last resort when all keys compare equal, sort compares
entire lines as if no ordering options other than --reverse (-r) were
specified.

I'll confess I'm a little mystified about how to interpret that.

Sorting multiple keys with Unix sort -- Bug?

-k2 uses all the characters from the beginning of the 2nd field to the end of the line, because you did not specify where the key ends. So the lines

0.322_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat    0.000110687417806       0.0346076270248
0.3_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000111161259827 0.0358869210331

are correctly sorted because in both keys begin with _rsrc:15 and 0.000110 sorts before 0.000111. The key phrase in the manual page is

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.

unix sorting, with primary and secondary keys

The manual shows some examples.

In accordance with zseder's comment, this works:

sort -t"<TAB>" -k1,1d -k3,3g

Tab should theoretically work also like this sort -t"\t".

If none of the above work to delimit by tab, this is an ugly workaround:

TAB=`echo -e "\t"`
sort -t"$TAB"

Sorting a file by multiple columns using bash sort

You're missing the -n/--numeric-sort option, to sort according to string numerical value, not lexicographically (at least for second and third field):

$ sort -k1,1 -k2,2n -k3,3n file.txt
word01.1 5 8
word01.1 10 20
word01.1 10 30
word01.1 40 50
word01.2 10 25
word01.2 30 50
word01.2 40 50

Note that you can provide a global -n flag, to sort all fields as numerical values, or per key. Format for key is -k KEYDEF, where KEYDEF is F[.C][OPTS][,F[.C][OPTS]] and OPTS is one or more of ordering options, like n (numerical), r (reverse), g (general numeric), h (human numeric), etc.

unix sort multiple fields

You need one of:

sort --key=1,1 --key=2,2r --key=3,3 --key=4,4r
sort -k1,1 -k2,2r -k3,3 -k4,4r

as in the following transcript:

pax$ echo '5 3 2 9
3 4 1 7
5 2 3 1
6 1 3 6
1 2 4 5
3 1 2 3
5 2 2 3' | sort --key=1,1 --key=2,2r --key=3,3 --key=4,4r

1 2 4 5
3 4 1 7
3 1 2 3
5 3 2 9
5 2 2 3
5 2 3 1
6 1 3 6

Remember to provide the -n option if you want them treated as proper numbers (variable length), such as:

sort -n -k1,1 -k2,2r -k3,3 -k4,4r

sort alphanumerically with priority for numbers in linux

So, basically, you're asking to sort the first field numerically in descending order, but if the numeric keys are the same, you want the second field to be ordered in natural, or ascending, order.

I tried a few things, but here's the way I managed to make it work:

   sort -nk2 file.txt  | sort -snrk1

Explanation:

  • The first command sorts the whole file using the second, alphanumeric field in natural order, while the second command sorts the output using the first numeric field, shows it in reverse order, and requests that it be a "stable" sort.

  • -n is for numeric sort, versus alphanumeric, in which 6 would come before 60.

  • -r is for reversed order, so from highest to lowest. If unspecified, it will assume natural, or ascending, order.
  • -k which key, or field, to use for sorting order.
  • -s for stable ordering. This option maintains the original record order of records that have an equal key.

unix sort by single column only

From the POSIX description of sort:

Except when the -u option is specified, lines that otherwise compare equal shall be ordered as if none of the options -d, -f, -i, -n, or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison. The order in which lines that still compare equal are written is unspecified.

So in your case, when two lines have the same value in the second column and thus are equal, the entire lines are then compared to get the final ordering.

GNU sort (And possibly other implementations, but it's not mandated by POSIX) has the -s option for a stable sort where lines with keys that compare equal appear in the same order as in the original, which is what it appears you want:

$ sort -t, -s -k2,2n chris.num
1,4,3
1,4,1
1,5,2
1,7,2
1,7,1


Related Topics



Leave a reply



Submit