UNIX sort ignores whitespaces
Solved by:
export LC_ALL=C
From the sort()
documentation:
WARNING: The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.
(works for ASCII at least, no idea for UTF8)
Why does the UNIX sort utility ignore leading spaces without the option -b?
It depends on the locale. With
LC_COLLATE=en_US.utf8 sort myfile
I get your unexpected result, and with
LC_COLLATE=C sort myfile
I get your expected result. Also see bash sort unusual order. Problem with spaces?
(I don't know why sort handles -b and -t like this.)
How to sort and ignore spaces?
Bash replaces $'\t'
with a real tab:
LC_ALL=C sort file -t $'\t' -k 2
Output:
5816470687 aa a dissertation for the 933 2 2 2
742550111 aaa aaa aaa aaa aaa 2008 3 1 1
unix command - ignore The while sorting
ls | sed -e 's/^The \(.*\)/\1, The/' | sort | sed -e 's/\(.*\), The$/The \1/'
unix sort -n -t , gives unexpected result
I'm not sure this is entirely correct, but it's close.
sort -n -t,
will try to sort numerically by the given key(s). In this case, the key is a tuple consisting of an integer and a float. Such tuples cannot be sorted numerically.
If you explicitly specify which single keys to sort on with
sort -k1,1n -k2,2n -t,
it should work. Now you are explicitly telling sort
to first sort on the first field (numerically), then on the second field (also numerically).
I suspect that -n
is useful as a global option only if each line of the input consists of a single numerical value. Otherwise, you need to use the -n
option in conjunction with the -k
option to specify exactly which fields are numbers.
Why doesn't **sort** sort the same on every machine?
The man-page on OS X says:
******* WARNING ******* The locale specified by the environment affects sort order. Set LC_ALL=C to get
the traditional sort order that uses native byte values.
which might explain things.
If some of your systems have no locale support, they would default to that locale (C), so you wouldn't have to set it on those. If you have some that supports locales and want the same behavior, set LC_ALL=C
on those systems. That would be the way to have as many systems as I know do it the same way.
If you don't have any locale-less systems, just making sure they share locale would probably be enough.
For more canonical information, see The Single UNIX ® Specification, Version 2 description of locale, environment variables, setlocale() and the description of the sort(1) utility.
How can I diff 2 files while ignoring leading white space
diff
has some options that can be useful to you:
-E, --ignore-tab-expansion
ignore changes due to tab expansion
-Z, --ignore-trailing-space
ignore white space at line end
-b, --ignore-space-change
ignore changes in the amount of white space
-w, --ignore-all-space
ignore all white space
-B, --ignore-blank-lines
ignore changes whose lines are all blank
So diff -w old new
should ignore all spaces and thus report only substantially different lines.
sort not sorting as expected (space and locale)
It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.
$ cat foo.txt
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011
Treatment of spaces in sort command. Difference between LC_COLLATE=c and LC_COLLATE= en_US.UTF-8
punctuation is ignored when ordering in the en_US locale
Note sort can explicitly skip whitespace with the -b option,
but note that's trick to use, so I'd advise using the sort --debug
option when using that.
Related Topics
Keep Meteor Running on Amazon Ec2
How to Make Binary Distribution of Qt Application for Linux
Cpu Affinity Masks (Putting Threads on Different Cpus)
Binary Data Over Serial Terminal
Fallocate() Command Equivalent in Os X
Why Mongodb Performance Better on Linux Than on Windows
How to Install Opencv on Amazon Linux
Get Link Speed Programmatically
How to Save the Output of This Awk Command to File
Linux Kernel - Add System Call Dynamically Through Module
Building a Simple (Hello-World-Esque) Example of Using Ld's Option -Rpath with $Origin