stable sort in linux
You forgot to constrain the key fields. By default it uses until the end of the line.
sort -k1,1 -s t.txt
Stable sort a huge file
I would split the file into chunks (you might be able to do that on the command line, but it depends on the data; you might need a program to do that). The chunk size is up to you (a few megabytes is fine; make sure unix sort is fast with one chunk).
Then sort each chunk using unix sort (sort -s -k...
). If you have multiple machines, you can do that in parallel.
Then merge all sorted chunks using unix sort (sort -m -k...
). This should be stable as well if you specify the file list in the right order. If it is not (I didn't test that and didn't find any info, but most likely it is stable), then you might need to write your own merge program, which shouldn't be very complicated.
If you have too many chunks to merge efficiently, you could merge chunks 1..10 together to chunk a, then merge chunks 11..20 to chunk b (again you can do that on multiple machines in parallel), and finally merge chunks a..z. But I doubt this is really needed.
how can i sort in bash the first column but ignore any other column in order
Use the stable sort:
sort -nsk1,1
-n
sort numerically-k1,1
sorts by the first column ("from the first to the first")-s
means "stable", i.e. keep the input order in case of a draw
Note that not all implementations of sort
support the -s
, as it's not mentioned in the POSIX specification.
GNU `sort` command fails to sort with stable and (general) numeric sorting turned on
Numeric sort sorts by the longest numeric prefix of the sort field, ignoring leading whitespace. The numeric prefix is allowed to be empty: "An empty digit string shall be treated as zero".
Stable sort retains the original order for lines whose keys compare equal, so if you stable numeric sort lines not starting with numbers, the output will be identical to the input.
The quote above is from the Posix standard; the full documentation for gnu sort can be found with info sort
if documentation is correctly installed on your machine, or via the url at the bottom of the sort
manpage, from which I extracted this link to the -n option.
When and why do i need a 's' flag for sorting by a column
From man sort
:
-s, --stable
stabilize sort by disabling last-resort comparison
In other words, if sort
finds that two lines have equal keys, it will compare the entire lines to sort them. You disabled this last comparison with -s
so you kept the original order.
https://en.wikipedia.org/wiki/Sorting_algorithm#Stability
How to sort a text file by column and keep the original order
Option -s
is what you need (equivalent to --stable
):
sort -k11,11 -d -s myfile.txt > sortedfile
The option -k
works with a range of fields, so you should probably add ,11
as I did above, otherwise the sorting will use keys spanning from column 11 to the end of line (default).
Preserve original order if numeric value is equal in coreutils sort?
Just add the -s
(stable sort) flag, this disables last-resort comparison
echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -k 1,1n -s
2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d
Natural sorting in reverse?
I think you just need to specify that the numeric reverse sort only applies to the first field:
$ sort -k1,1nr file
6 aaa
4 bbb
2 ccc
2 ddd
-k1,1[OPTS]
means that OPTS only apply between the 1st and 1st field. The rest of the line is sorted according to global ordering options. In this case, since no other options were passed, this means the default lexicographic sort.
Related Topics
Cannot Install Extensions in Visual Studio Code
How to Launch Multiple Xterm Windows and Run a Command on Each, Leaving Each Window Open Afterward
Automatically Kill Process That Consume Too Much Memory or Stall on Linux
Must a Process Group Have a Running Leader Process
Shell Script Function Return a String
How Many Packets or Bytes Are in the Socket Receive Queue
How to Pass an Environment Variable to a Netbeans Makefile on Ubuntu
How to Write-Protect Every Page in the Address Space of a Linux Process
On Linux, How to Make Sure to Unlock a Mutex Which Was Locked in a Thread That Dies/Terminates
Linux X86-64 Assembly and Printf
How to Detect a Buffer Over Run on Serial Port in Linux Using C++
Parse CSV in Bash and Assign Variables
How to Speed Up Compilation Time in Linux
List Files Recursively in Linux Cli With Path Relative to the Current Directory
A Way to Determine a Process'S "Real" Memory Usage, I.E. Private Dirty Rss