Sort Not Sorting as Expected (Space and Locale)

sort not sorting as expected (space and locale)

It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.

$ cat foo.txt 
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011

Bash : sort command do not treat dots

When sorting, your current locale is influencing the order. If you want locale independent order, use the C locale:

IFS=$'\n'; echo "${a[*]}" | LC_ALL=C sort -d; unset IFS

Setting LC_COLLATE should be enough, in fact.

Why does the sort command sort differently if there are trailing fields?

The man page for my version of sort says:

***  WARNING  *** The locale specified by the environment affects sort order.  
Set LC_ALL=C to get the traditional sort order that uses native byte values.

And indeed, if I set LC_ALL=C and run sort on your second example, I get:

$ LC_ALL=C sort < tosort 
a 12
a01 7
a02 42

Your default locate is probably something other than C.

SciTE sort selection tool : numbers with leading spaces are not sorted as expected

Your expectation is wrong. You said the algorithm is supposed to sort the texts alphabetically and that is exactly what it does.

For Lua "11" is smaller than "2".
I think you would agree that "aa" should come befor "b" which is pretty much the same thing.

If you want to change how texts are sorted you have to provide your own function.

The Lua reference manual says:

table.sort (list [, comp])

Sorts list elements in a given order, in-place, from list[1] to
list[#list]. If comp is given, then it must be a function that
receives two list elements and returns true when the first element
must come before the second in the final order (so that, after the
sort, i < j implies not comp(list[j],list[i])). If comp is not given,
then the standard Lua operator < is used instead.

Note that the comp function must define a strict partial order over
the elements in the list; that is, it must be asymmetric and
transitive. Otherwise, no valid sort may be possible.

The sort algorithm is not stable: elements considered equal by the
given order may have their relative positions changed by the sort.

So you are free to implement your own comp function to change the sorting.

By default table.sort(list) sort list in ascending order.
To make it sort in descending order you call:

table.sort(list, function(a,b) return a > b end)

If you want to treat numbers differently you can do something like this:

t = {"111", "11", "3", "2", "a", "b"}

local function myCompare(a,b)
local a_number = tonumber(a)
local b_number = tonumber(b)
if a_number and b_number then
return a_number < b_number
end
end

table.sort(t, myCompare)

for i,v in ipairs(t) do
print(v)
end

which would give you the output

2
3
11
111
a
b

Of course this is just a quick and simple example. A nicer implementation is up to you.

UNIX sort ignores whitespaces

Solved by:

export LC_ALL=C

From the sort() documentation:

WARNING: The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

(works for ASCII at least, no idea for UTF8)



Related Topics



Leave a reply



Submit