Sort a Find Command to Respect a Custom Order in Unix

Sort a find command to respect a custom order in Unix

Using a variant of my answer to your original question:

./your-script | awk -v keysInOrder='rp,alpha,beta-ri,beta-rs,RC' '
BEGIN {
FS=OFS="-"
keyCount = split(keysInOrder, a, ",")
for (i = 1; i <= keyCount; ++i) keysToOrdinal[a[i]] = i
}
{
sortKey = $2
if (NF == 3) sortKey = sortKey FS $3
sub(/[0-9]+$/, "", sortKey)
auxFieldPrefix = "|" FS
if (NF == 2) auxFieldPrefix = auxFieldPrefix FS
sub(/[0-9]/, auxFieldPrefix "&", $NF)
sortOrdinal = sortKey in keysToOrdinal ? keysToOrdinal[sortKey] : keyCount + 1
print sortOrdinal, $0
}
' | sort -t- -k1,1n -k3,3 -k5,5n | sed 's/^[^-]*-//; s/|-\{1,2\}//'

./your-script represents whatever command produces the output you want to sort.

Note that an aux. character, |, is used to facilitate sorting, and the assumption is that this character doesn't appear in the input - which should be reasonable safe, given that filesystem paths usually don't contain pipe characters.

Any field 2 values (sans numeric suffix) that aren't in the list of sort keys, sort after the field 2/3 values that are, using alphabetic sorting among them.

Sort alphanumeric string in Unix using custom sort order

You can use an auxiliary awk command as follows:

awk -F- -v keysInOrder="alpha,beta,gamma,delta" '
BEGIN {
split(keysInOrder, a, ",")
for (i = 1; i <= length(a); ++i) keysToOrdinal[a[i]] = i
}
{ print keysToOrdinal[$1] "-" $0 }
' numbers.txt | sort -t- -k1,1n -k3,3n | cut -d- -f2-
  • The awk command is used to:

    • map the custom keys onto numbers that reflect the desired sort order; note that the full list of keys must be passed via variable keysInOrder, in order.

    • prepend the numbers to the input as an auxiliary column, using separator - too; e.g., beta-3 becomes 2-beta-3, because beta is in position 2 in the ordered list of sort keys.

  • sort then sorts awk's output by the mapped numbers as well as the original number in the 2nd column, yielding the desired custom sort order.

  • cut then removes the aux. mapped numbers again.

Sorting data based on second column of a file

You can use the key option of the sort command, which takes a "field number", so if you wanted the second column:

sort -k2 -n yourfile

-n, --numeric-sort compare according to string numerical value

For example:

$ cat ages.txt 
Bob 12
Jane 48
Mark 3
Tashi 54

$ sort -k2 -n ages.txt
Mark 3
Bob 12
Jane 48
Tashi 54

Confused about GNU `sort(1)` of a numerical sub field

The answer is: the leading space(s) are counted as part of the field, unless:

sort -b -n -k2.4 table

or curiously:

LC_ALL=C sort -t" " -n -k2.4 table

that also yields the correct result.


... and one more thing ...

It seem that it is better to use:

sort -b -n -k2.4,2 table

and thus limit the sort to the end of the 2nd field.

Issue with unix sort

this line should work for you:

sort -t, -n  -k2,2 test
  • you don't need cat test|sort, just sort file
  • the default END POS of -k is the end of line. so if you sort -k2 it means sort from the 2nd field till the end of line. In fact you need sort by exact the 2nd field. And this also explains why your sort worked if you removed 3rd col.

if test with your example:

kent$  sort -t, -n  -k2,2 file
class||sw sw-explr bot|results|id,3,72805487-72c3-4173-947f-e5abed6ea1e4,20130324,/html/body/div/div[3]/div[2]/div[2]/div[2]/div/div/div/div/div/div[3]/div/div/div[2]/ul/li[20]/div/img
class||sw sw-explr bot|results|id,23,0a522b36-556f-4116-b485-adcf132b6cad,20130325,/html/body/div/div[3]/div[2]/div[2]/div[3]/div/div/div/div/div/div[2]/div/div/ul/li[4]/div/img
class||sw sw-explr bot|results|id,40,30cefa2c-6ebf-485e-b49c-3a612fe3fd73,20130323,/html/body/div/div[3]/div[2]/div[3]/div[3]/div/div/div/div/div[3]/div/div/ul/li[8]/div/img

Sort CSV file by multiple columns using the sort command

You need to use two options for the sort command:

  • --field-separator (or -t)
  • --key=<start,end> (or -k), to specify the sort key, i.e. which range of columns (start through end index) to sort by. Since you want to sort on 3 columns, you'll need to specify -k 3 times, for columns 2,2, 1,1, and 3,3.

To put it all together,

sort -t ';' -k 2,2 -k 1,1 -k 3,3

Note that sort can't handle the situation in which fields contain the separator, even if it's escaped or quoted.

Also note: this is an old question, which belongs on UNIX.SE, and was also asked there a year later.


Old answer: depending on your system's version of sort, the following might also work:

sort --field-separator=';' --key=2,1,3

Or, you might get "stray character in field spec".

According to the sort manual, if you don't specify the end column of the sort key, it defaults to the end of the line.

Issue with sorting in Linux/Unix

you are sorting on first column. When many records have the same value in the first column, they can be in any random order. For big files, the algorithm depends of the available memory (http://vkundeti.blogspot.fr/2008/03/tech-algorithmic-details-of-unix-sort.html).

If you want to preserve the order, you should add the option '-s'.



Related Topics



Leave a reply



Submit