Different versions of UNIX sort handle case differently
try using POSIX: 'export LANG=POSIX'
linux sort unexpected output
The solution provided by @cnicutar is correct, but the reason needs explanation which is why I'm giving a new answer.
After the discussion with @cnicutar where in the end I suspected a bug in coreutils' sort
I found that this sorting behavior is expected:
At that point sort appears broken because case is folded and punctuation is ignored because ‘en_US.UTF-8’ specifies this behavior.
So to sort
, your input seems to be mapped as follows:
ABC -> ABC
AB-C -> ABC
ABCDEFG-HI -> ABCDEFGHI
If you want pure ASCII sorting, you need to call LC_ALL=C sort
(temporarily set the locale to C
when calling sort
which means "standard" behavior without localization; you can also use POSIX
instead of C
).
On other Unixes this behavior seems to be different (tested on Mac OS X which userland tools are derived from FreeBSD), but LC_ALL=C sort
should yield the same behavior across all POSIX systems.
Why does every text editor write an additional byte (UTF-8)?
You are seeing a newline character (often expressed in programming languages as \n
, in ASCII it is hex 0a, decimal 10):
$ echo 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f0a foo.
The hex-dump tool xxd
shows that the file consists of 4 bytes, hex 66 (ASCII lowercase f), two times hex 65 (lowercase letter o) and the newline.
You can use the -n
command-line switch to disable adding the newline:
$ echo -n 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f foo
or you can use printf
instead (which is more POSIX compliant):
$ printf 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f foo
Also see 'echo' without newline in a shell script.
Most text editors will also add a newline to the end of a file; how to prevent this depends on the exact editor (often you can just use delete at the end of the file before saving). There are also various command-line options to remove the newline after the fact, see How can I delete a newline if it is the last character in a file?.
Text editors generally add a newline because they deal with text lines, and the POSIX standard defines that text lines end with a newline:
3.206 Line
A sequence of zero or more non-<newline>
characters plus a terminating<newline>
character.
Also see Why should text files end with a newline?
Unix sort treatment of underscore character
You can set LC_COLLATE
to traditional sort order just for your command:
env LC_COLLATE=C sort tmp
This won't change the current environment just the one in which the sort command executes.
You should have the same behaviour with this.
Different ORDER BY behavior on localhost and production
It is because Ubuntu do sorting different than Mac Os and Windows. It just ignores the !
exclamation mark and sorts them normally by the second letter. You may search for sort ubuntu exclamation.
- https://ubuntuforums.org/showthread.php?t=1564233
- https://askubuntu.com/questions/422708/how-to-show-some-files-at-the-top-of-the-list-in-ubuntu
Seems the PostgreSQL is being based on the sorting defined by the system.
Related Topics
How to Cancel Command in Grunt Shell
Undelete The Deleted Command in Bash
Install Python 32 Bit on 64 Bit Linux
Docker Can't Write to Directory Mounted Using -V Unless It Has 777 Permissions
How to Play an Audio File from Haskell Code, Cross-Platform
Bash Cut Columns to One File and Save onto The End of Another File
Compare Two Different Urls Using Linux
Print Bash Script Result Behind Prompt in The Next Line
Sudo/Apt-Get Command Not Found in Git Bash
Do I Have to Pthread_Join Each Thread I Create
Does Chrome 12 Really Support CSS 3D Transforms? Including on Linux
Expect Utility Is Not Working When Executing from Jenkins
Where Is The Linux Socket Implementation
Where Do You Send The Kernel Console on an Embedded System