Sorting on the Last Field of a Line

Sorting on the last field of a line

Here's a Perl command line (note that your shell may require you to escape the $s):

perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} <>"

Just pipe the list into it or, if the list is in a file, put the filename at the end of the command line.

Note that this script does not actually change the data, so you don't have to be careful about what delimeter you use.

Here's sample output:


>perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} " files.txt
/a/e/f/g/h/01-do-this-first
/a/b/c/10-foo
/a/b/c/20-bar
/a/d/30-bob
/a/b/c/50-baz
/a/e/f/g/h/99-local

Awk sort by last column and the print the whole line

Here is a gnu-awk command to get this in single command:

awk 'NR > 1 && (!($NF in map) || $2 > map[$NF]) {map[$NF] = $2; rec[$NF] = $0}
END {PROCINFO["sorted_in"]="@ind_str_desc"; for (i in rec) print rec[i]}' file

ov0002 1.40 Feb 05 2019 I42 v2.04 (04/18/2019)  ov0002
ov0001 1.46 Jul 25 2019 I42 v2.14 (09/05/2019)  ov0001

You can get header row as well if you want:

awk 'NR == 1 {print; next} !($NF in map) || $2 > map[$NF] {map[$NF] = $2; rec[$NF] = $0}
END {PROCINFO["sorted_in"]="@ind_str_desc"; for (i in rec) print rec[i]}' file

column1                 COlumn2                 column3
ov0002 1.40 Feb 05 2019 I42 v2.04 (04/18/2019)  ov0002
ov0001 1.46 Jul 25 2019 I42 v2.14 (09/05/2019)  ov0001

Bash: sort text file by last field value

Use awk to put the numeric key up front. $NF is the last field of the current record. Sort. Use sed to remove the duplicate key.

awk -F, '{ print $NF, $0 }' yourfile | sort -n -k1 | sed 's/^[0-9][0-9]* //'

How do I sort input with a variable number of fields by the second-to-last field?

Note: There are several, potentially separate questions:

Update: Question C was the relevant one.

Question A: As implied by the question's title only: how can you use the tab character (\t) as the field separator?
Question B: How can you sort input by the second-to-last field, without knowing that field's specific index up front, given a fixed number of fields?
Question C: How can you sort input by the second-to-last field, without knowing that field's respective index up front, given a variable number of fields?

Answer to question A:

sort's -t option allows you to specify a field separator.
By default, sort uses any run of line-interior whitespace as the separator.

Assuming Bash, Ksh, or Zsh, you can use an ANSI C-quoted string ($'...') to specify a single tab as the field separator ($'\t'):

sort -t $'\t' -n -k8,8 file # -n sorts numerically; omit for lexical sorting

Answer to question B:

Note: This assumes that all input lines have the same number of fields, and that input comes from file file:

 # Determine the index of the next-to-last column, based on the first
 # line, using Awk:
 nextToLastColNdx=$(head -n 1 file | awk -F '\t' '{ print NF - 1 }')

 # Sort numerically by the next-to-last column (omit -n to sort lexically):
 sort -t $'\t' -n -k$nextToLastColNdx,$nextToLastColNdx file

Note: To sort by a single field, always specify it as the end field too (e.g., -k8,8), as above, because sort, given only a start field index (e.g., -k8), sorts from the specified field through the remainder of the line.

Answer to question C:

Note: This assumes that input lines may have a variable number of fields, and that on each line it is that line's second-to-last field that should act as the sort field; input comes from file file:

awk '{ printf "%s\t%s\n", $(NF-1), $0 }' file |
  sort -n -k1,1 | # omit -n to perform lexical sorting
    cut -f2-

The awk command extracts each line's second-to-last field and prepends it to the input line on output, separated by a tab.
The result is sorted by the first field (i.e., each input line's second-to-last field).
Finally, the artificially prepended sort field is removed again, using cut.

Sorting lines and removing all but one line based on the last string?

$ sort -t_ -u -k2 file

www.site.com/324242_1234
www.site.com/6545_2345
www.site.com/87745_456

assumes there are no preceding underscores.

awk solution can be

$ awk -F_ '!a[$NF]++' file

www.site.com/324242_1234
www.site.com/6545_2345
www.site.com/87745_456

Explanation After setting the field delimiter, $NF refers to the last field, a[$NF]++ counts the occurrences of each value starting with zero. !a[$NF]++ negates the value, so it will only be true when the count is zero, which is the first instance of the key value looked. This site has many examples of this awk idiom.

Sorting on the Last Field of a Line