Sorting on the last field of a line
Here's a Perl command line (note that your shell may require you to escape the $
s):
perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} <>"
Just pipe the list into it or, if the list is in a file, put the filename at the end of the command line.
Note that this script does not actually change the data, so you don't have to be careful about what delimeter you use.
Here's sample output:
>perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} " files.txt
/a/e/f/g/h/01-do-this-first
/a/b/c/10-foo
/a/b/c/20-bar
/a/d/30-bob
/a/b/c/50-baz
/a/e/f/g/h/99-local
Awk sort by last column and the print the whole line
Here is a gnu-awk
command to get this in single command:
awk 'NR > 1 && (!($NF in map) || $2 > map[$NF]) {map[$NF] = $2; rec[$NF] = $0}
END {PROCINFO["sorted_in"]="@ind_str_desc"; for (i in rec) print rec[i]}' file
ov0002 1.40 Feb 05 2019 I42 v2.04 (04/18/2019) ov0002
ov0001 1.46 Jul 25 2019 I42 v2.14 (09/05/2019) ov0001
You can get header row as well if you want:
awk 'NR == 1 {print; next} !($NF in map) || $2 > map[$NF] {map[$NF] = $2; rec[$NF] = $0}
END {PROCINFO["sorted_in"]="@ind_str_desc"; for (i in rec) print rec[i]}' file
column1 COlumn2 column3
ov0002 1.40 Feb 05 2019 I42 v2.04 (04/18/2019) ov0002
ov0001 1.46 Jul 25 2019 I42 v2.14 (09/05/2019) ov0001
Bash: sort text file by last field value
Use awk to put the numeric key up front. $NF
is the last field of the current record. Sort. Use sed to remove the duplicate key.
awk -F, '{ print $NF, $0 }' yourfile | sort -n -k1 | sed 's/^[0-9][0-9]* //'
How do I sort input with a variable number of fields by the second-to-last field?
Note: There are several, potentially separate questions:
Update: Question C was the relevant one.
Question A: As implied by the question's title only: how can you use the tab character (
\t
) as the field separator?Question B: How can you sort input by the second-to-last field, without knowing that field's specific index up front, given a fixed number of fields?
Question C: How can you sort input by the second-to-last field, without knowing that field's respective index up front, given a variable number of fields?
Answer to question A:
sort
's -t
option allows you to specify a field separator.
By default, sort
uses any run of line-interior whitespace as the separator.
Assuming Bash, Ksh, or Zsh, you can use an ANSI C-quoted string ($'...'
) to specify a single tab as the field separator ($'\t'
):
sort -t $'\t' -n -k8,8 file # -n sorts numerically; omit for lexical sorting
Answer to question B:
Note: This assumes that all input lines have the same number of fields, and that input comes from file file
:
# Determine the index of the next-to-last column, based on the first
# line, using Awk:
nextToLastColNdx=$(head -n 1 file | awk -F '\t' '{ print NF - 1 }')
# Sort numerically by the next-to-last column (omit -n to sort lexically):
sort -t $'\t' -n -k$nextToLastColNdx,$nextToLastColNdx file
Note: To sort by a single field, always specify it as the end field too (e.g., -k8,8
), as above, because sort
, given only a start field index (e.g., -k8
), sorts from the specified field through the remainder of the line.
Answer to question C:
Note: This assumes that input lines may have a variable number of fields, and that on each line it is that line's second-to-last field that should act as the sort field; input comes from file file
:
awk '{ printf "%s\t%s\n", $(NF-1), $0 }' file |
sort -n -k1,1 | # omit -n to perform lexical sorting
cut -f2-
- The
awk
command extracts each line's second-to-last field and prepends it to the input line on output, separated by a tab. - The result is sorted by the first field (i.e., each input line's second-to-last field).
- Finally, the artificially prepended sort field is removed again, using
cut
.
Sorting lines and removing all but one line based on the last string?
$ sort -t_ -u -k2 file
www.site.com/324242_1234
www.site.com/6545_2345
www.site.com/87745_456
assumes there are no preceding underscores.
awk
solution can be
$ awk -F_ '!a[$NF]++' file
www.site.com/324242_1234
www.site.com/6545_2345
www.site.com/87745_456
Explanation After setting the field delimiter, $NF
refers to the last field, a[$NF]++
counts the occurrences of each value starting with zero. !a[$NF]++
negates the value, so it will only be true when the count is zero, which is the first instance of the key value looked. This site has many examples of this awk
idiom.
Related Topics
File Names with Spaces in Bash
"Max Open Files" for Working Process
Piping Text to an External Program Appends a Trailing Newline
How to Check If X Server Is Running
Find Matching Text and Replace Next Line
Initial State of Program Registers and Stack on Linux Arm
How to Execute a Remote Command Over Ssh with Arguments
Ioctl VS Netlink VS Memmap to Communicate Between Kernel Space and User Space
"Zero Copy Networking" VS "Kernel Bypass"
Pack Shared Libraries into the Elf
Linux Time Command Microseconds or Better Accuracy
Create Zip File and Ignore Directory Structure