Sorting multiple keys with Unix sort
Use the -k
option (or --key=POS1[,POS2]
). It can appear multiple times and each key can have global options (such as n
for numeric sort)
sorting with multiple keys with Linux sort command
I find this caution in the GNU sort docs.
Sort numerically on the second field and resolve ties by sorting
alphabetically on the third and fourth characters of field five. Use
‘:’ as the field delimiter.sort -t : -k 2,2n -k 5.3,5.4
Note that if you had written -k 2n instead of -k 2,2n sort would have
used all characters beginning in the second field and extending to the
end of the line as the primary numeric key. For the large majority of
applications, treating keys spanning more than one field as numeric
will not do what you expect.
I'm not sure what it ends up with when it evaluates '1001 3' as a numeric key, but "will not do what you expect" is accurate. It seems clear that the Right Thing to do is to specify each key independently.
The same web page says this about resolving "ties".
Finally, as a last resort when all keys compare equal, sort compares
entire lines as if no ordering options other than --reverse (-r) were
specified.
I'll confess I'm a little mystified about how to interpret that.
Sorting multiple keys with Unix sort -- Bug?
-k2
uses all the characters from the beginning of the 2nd field to the end of the line, because you did not specify where the key ends. So the lines
0.322_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000110687417806 0.0346076270248
0.3_rsrc:15_phi:0.5_abr:1_prof:gauss_diff:lap2.dat 0.000111161259827 0.0358869210331
are correctly sorted because in both keys begin with _rsrc:15
and 0.000110
sorts before 0.000111
. The key phrase in the manual page is
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end.
unix sorting, with primary and secondary keys
The manual shows some examples.
In accordance with zseder's comment, this works:
sort -t"<TAB>" -k1,1d -k3,3g
Tab should theoretically work also like this sort -t"\t"
.
If none of the above work to delimit by tab, this is an ugly workaround:
TAB=`echo -e "\t"`
sort -t"$TAB"
unix sort multiple fields
You need one of:
sort --key=1,1 --key=2,2r --key=3,3 --key=4,4r
sort -k1,1 -k2,2r -k3,3 -k4,4r
as in the following transcript:
pax$ echo '5 3 2 9
3 4 1 7
5 2 3 1
6 1 3 6
1 2 4 5
3 1 2 3
5 2 2 3' | sort --key=1,1 --key=2,2r --key=3,3 --key=4,4r
1 2 4 5
3 4 1 7
3 1 2 3
5 3 2 9
5 2 2 3
5 2 3 1
6 1 3 6
Remember to provide the -n
option if you want them treated as proper numbers (variable length), such as:
sort -n -k1,1 -k2,2r -k3,3 -k4,4r
unix - how to sort on a specific key
You can do a decorate, sort, undecorate pipe like so:
$ awk -F",|:" '{printf "%s\t%s\n", $3,$0}' file | sort -n | awk -F"\t" '{print $2}'
Or, if using a \t
or other unique delimiter, you can use cut
:
$ awk -F",|:" '{printf "%s\t%s\n", $3,$0}' file | sort -n | cut -f 2
Either case:
10.1.3.100:{ range_start" : 30, "range_end" : 30 }
10.1.3.27:{ "range_start" : 33, "range_end" : 33 }
10.1.2.161:{ "range_start" : 44, "range_end" : 44 }
10.1.4.239:{ "range_start" : 53, "range_end" : 53 }
10.1.2.233:{ "range_start" : 78, "range_end" : 78 }
10.1.3.39:{ "range_start" : 96, "range_end" : 96 }
Change the \t
to another delimiter if there are tabs in the text data.
unix sort for 2 fields numeric order
There's a fascinating article on re-engineering the Unix sort
('Theory and Practice in the Construction of a Working Sort Routine', J P Linderman, AT&T Bell Labs Tech Journal, Oct 1984) which is not, unfortunately, available on the internet, AFAICT (I looked a year or so ago and did not find it; I looked again just now, and can find references to it, but not the article itself). Amongst other things, the article demonstrated that for Unix sort
, the comparison time far outweighs the cost of moving data (not very surprising when you consider that the comparison has to compare fields determined per row, but moving 'data' is simply a question of switching pointers around). One upshot of that was that they recommend doing what danfuzz suggests; mapping keys to make comparisons easy. They showed that even a simple scripted solution could save time compared with making sort work really hard.
So, you could think in terms of using a character that's unlikely to appear in the data file naturally (such as Control-A) as the key field separator.
sed 's/^\([^.]*\)[.]\([^.]*\)[.]\([^ ]*\) Step \([0-9]*\):.*/\1^A\2^A\3^A\4^A&/' file |
sort -t'^A' -k1,1n -k2,2n -k3,3n -k4,4n |
sed 's/^.*^A//'
The first command is the hard one. It identifies the 4 numeric fields, and outputs them separated by the chosen character (written ^A
above, typed as Control-A), and then outputs a copy of the original line. The sort then works on the first four fields numerically, and the final sed
commands strips off the front of each line up to and including the last Control-A, giving you the original line back again.
Sorting a file by multiple columns using bash sort
You're missing the -n
/--numeric-sort
option, to sort according to string numerical value, not lexicographically (at least for second and third field):
$ sort -k1,1 -k2,2n -k3,3n file.txt
word01.1 5 8
word01.1 10 20
word01.1 10 30
word01.1 40 50
word01.2 10 25
word01.2 30 50
word01.2 40 50
Note that you can provide a global -n
flag, to sort all fields as numerical values, or per key. Format for key is -k KEYDEF
, where KEYDEF
is F[.C][OPTS][,F[.C][OPTS]]
and OPTS
is one or more of ordering options, like n
(numerical), r
(reverse), g
(general numeric), h
(human numeric), etc.
Related Topics
Automatically Run a Program on Startup Under Linux Ubuntu
How to Get the Process Id to Kill a Nohup Process
How to Determine If a Process Runs Inside Lxc/Docker
How to Run Multiple Background Commands in Bash in a Single Line
Difference Between Clock_Realtime and Clock_Monotonic
How to Force a Makefile to Rebuild a Target
How to Start Solr Automatically
How to Print Third Column to Last Column
How to Deal With a Filename That Starts With the Hyphen (-) Character
Using the "Alternate Screen" in a Bash Script
Shell/Bash Shortcut For Bulk Renaming of Files in a Folder
How to Redirect Stderr and Stdout to Different Files in the Same Line in Script
Uninstall Node.Js Using Linux Command Line
Linux Flock, How to "Just" Lock a File
Maximum Length of Command Line Argument That Can Be Passed to Sql*Plus