Sorting in Bash

Sorting in bash

Use:

cut -f <col_num> <filename>
| sort
| uniq -c
| sort -r -k1 -n
| awk '{print $2" "$1}'

The sort -r -k1 -n sorts in reverse order, using the first field as a numeric value. The awk simply reverses the order of the columns. You can test the added pipeline commands thus (with nicer formatting):

pax> echo '105 Linux
55 MacOS
500 Windows' | sort -r -k1 -n | awk '{printf "%-10s %5d\n",$2,$1}'
Windows 500
Linux 105
MacOS 55

Terminal: SORT command; how to sort correctly?

If I understand the problem correctly, you want the "natural sort order" as described in Natural sort order - Wikipedia, Sorting for Humans : Natural Sort Order, and macos - How does finder sort folders when they contain digits and characters?.

Using Linux sort(1) you need the -V (--version-sort) option for "natural" sort. You also need the -f (--ignore-case) option to disregard the case of letters. So, assuming that the file names are stored one-per-line in a file called files.txt you can produce a list (mostly) sorted in the way that you want with:

sort -Vf files.txt

However, sort -Vf sorts underscores after digits and letters on my system. I've tried using different locales (see How to set locale in the current terminal's session?), but with no success. I can't see a way to change this with sort options (but I may be missing something).

The characters . and ~ seem to consistently sort before numbers and letters with sort -V. A possible hack to work around the problem is to swap underscore with one of them, sort, and then swap again. For example:

tr '_~' '~_' <files.txt | LC_ALL=C sort -Vf |  tr '_~' '~_'

seems to do what you want on my system. I've explicitly set the locale for the sort command with LC_ALL=C ... so it should behave the same on other systems. (See Why doesn't sort sort the same on every machine?.)

sort by datetime format in bash

Sort is able to deal with month names, thanks to the option M

No need to change , into !. Use the white space as delimiter and just issue:

LC_ALL=en sort -k7nr -k5Mr -k6nr -k2r sample

If you use this as content of the file sample:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2

you will get this as output:

ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Apr 1 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2021, foo=moshe2, bar=haim2
ip=2.3.4.5, setup_time=05:59:30.260 GMT Tue Mar 17 2021, foo=moshe2, bar=haim2
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Mar 16 2021, foo=moshe, bar=haim
ip=1.2.3.4, setup_time=06:58:38.617 GMT Tue Feb 28 2021, foo=moshe, bar=haim
ip=2.3.4.5, setup_time=06:50:30.260 GMT Tue Mar 18 2020, foo=moshe2, bar=haim2

Specifying -k7 means to sort on the seventh field. The r option reverses the order of sorting to descending. The M option sorts according the name of the month. The n option sorts numerically. To sort on the time, just consider the whole second field (beginning with the string setup_time=) as a fixed length string using -k2.

LC_ALL=en in the begin of the command line tells the system to use the English names of the months.

How to sort an array in Bash

You don't really need all that much code:

IFS=$'\n' sorted=($(sort <<<"${array[*]}"))
unset IFS

Supports whitespace in elements (as long as it's not a newline), and works in Bash 3.x.

e.g.:

$ array=("a c" b f "3 5")
$ IFS=$'\n' sorted=($(sort <<<"${array[*]}")); unset IFS
$ printf "[%s]\n" "${sorted[@]}"
[3 5]
[a c]
[b]
[f]

Note: @sorontar has pointed out that care is required if elements contain wildcards such as * or ?:

The sorted=($(...)) part is using the "split and glob" operator. You should turn glob off: set -f or set -o noglob or shopt -op noglob or an element of the array like * will be expanded to a list of files.

What's happening:

The result is a culmination six things that happen in this order:

  1. IFS=$'\n'
  2. "${array[*]}"
  3. <<<
  4. sort
  5. sorted=($(...))
  6. unset IFS

First, the IFS=$'\n'

This is an important part of our operation that affects the outcome of 2 and 5 in the following way:

Given:

  • "${array[*]}" expands to every element delimited by the first character of IFS
  • sorted=() creates elements by splitting on every character of IFS

IFS=$'\n' sets things up so that elements are expanded using a new line as the delimiter, and then later created in a way that each line becomes an element. (i.e. Splitting on a new line.)

Delimiting by a new line is important because that's how sort operates (sorting per line). Splitting by only a new line is not-as-important, but is needed preserve elements that contain spaces or tabs.

The default value of IFS is a space, a tab, followed by a new line, and would be unfit for our operation.

Next, the sort <<<"${array[*]}" part

<<<, called here strings, takes the expansion of "${array[*]}", as explained above, and feeds it into the standard input of sort.

With our example, sort is fed this following string:

a c
b
f
3 5

Since sort sorts, it produces:

3 5
a c
b
f

Next, the sorted=($(...)) part

The $(...) part, called command substitution, causes its content (sort <<<"${array[*]}) to run as a normal command, while taking the resulting standard output as the literal that goes where ever $(...) was.

In our example, this produces something similar to simply writing:

sorted=(3 5
a c
b
f
)

sorted then becomes an array that's created by splitting this literal on every new line.

Finally, the unset IFS

This resets the value of IFS to the default value, and is just good practice.

It's to ensure we don't cause trouble with anything that relies on IFS later in our script. (Otherwise we'd need to remember that we've switched things around--something that might be impractical for complex scripts.)

bash: sort applied to a file returns right results as terminal output, but does change the file itself

SOLVED

From this thread it turns out that redirecting the output of sort into the same file from which sort reads as source will not work since

the shell is makes the redirections (not the sort(1) program) and the
input file (as being the output also) will be erased just before
giving the sort(1) program the opportunity of reading it.

So I have split my command into two

sort -k1 -n source-g5.txt > tmp-source-g5.txt
mv tmp-source-g5.txt > source-g5.txt

Bash : sort command do not treat dots

When sorting, your current locale is influencing the order. If you want locale independent order, use the C locale:

IFS=$'\n'; echo "${a[*]}" | LC_ALL=C sort -d; unset IFS

Setting LC_COLLATE should be enough, in fact.

Sorting files in bash

For this dataset, only sort of the first field.

$: printf "%s\n" V0.1__file_a.sql V0.2__file_b.sql V0__file_c.sql | sort -t _ -k 1,1
V0__file_c.sql
V0.1__file_a.sql
V0.2__file_b.sql

Using -k 1,2 fails for me also unless I use a dictionary sort with it (-d).

$: printf "%s\n" V0.1__file_a.sql V0.2__file_b.sql V0__file_c.sql | sort -t _ -k 1,2
V0.1__file_a.sql
V0.2__file_b.sql
V0__file_c.sql

but works with -d

$: printf "%s\n" V0.1__file_a.sql V0.2__file_b.sql V0__file_c.sql | sort -d -t _ -k 1,2
V0__file_c.sql
V0.1__file_a.sql
V0.2__file_b.sql

Dictionary sort will "consider only blanks and alphanumeric characters", so the dots and underscores are ignored, making all the filenames single strings of alphanumarics, and numbers as characters sort to the top.

-d alone still fails though - you need to establish fields.

$: printf "%s\n" V0.1__file_a.sql V0.2__file_b.sql V0__file_c.sql | sort -d
V0.1__file_a.sql
V0.2__file_b.sql
V0__file_c.sql

Using -t _ sets underscore as the delimiter, but sort is ignoring it on my implementation as well if I don't explicitly tell it to use a key field.

The combination forces V0 to be compared to V01 and V02 without comparing underscores to dots, so you get the order you wanted.

How to sort data according to the date in bash?

The relevant field be must rendered suitable for sorting, that is, in the form of YYYY-MM-DD, using a utility such as sed or awk. For example, with GNU sed:

sed -E 's/([0-9]{2})-([0-9]{2})-([0-9]{4})/\3-\2-\1/' employees.txt |
sort -r -t'|' -k5,5 | head -n1 | cut -d'|' -f2

Linux bash scripting: sorting a list to use

$@ is an array (of all script parameters), so you can sort using

OIFS="$IFS" # save IFS
IFS=$'\n' sorted=($(sort -n <<<"$*"))
IFS="$OIFS" # restore IFS

and then use the result like so:

for I in "${sorted[@]}"; do
...
done

Explanation:

  • IFS is an internal shell variable (internal field separator) which tells the shell which character separates words (default is space, tab and newline).
  • $'\n' expands to a single newline. When the shell expands $*, it will now put a new line between each element.
  • sort -n <<< pipes the "one argument per line" to sort which sorts numerically (-n)
  • sorted=($(...)) creates a new array with the result of the command ...

See also:

  • How to sort an array in BASH


Related Topics



Leave a reply



Submit