Stable Sort in Linux

stable sort in linux

You forgot to constrain the key fields. By default it uses until the end of the line.

sort -k1,1 -s t.txt

Stable sort a huge file

I would split the file into chunks (you might be able to do that on the command line, but it depends on the data; you might need a program to do that). The chunk size is up to you (a few megabytes is fine; make sure unix sort is fast with one chunk).

Then sort each chunk using unix sort (sort -s -k...). If you have multiple machines, you can do that in parallel.

Then merge all sorted chunks using unix sort (sort -m -k...). This should be stable as well if you specify the file list in the right order. If it is not (I didn't test that and didn't find any info, but most likely it is stable), then you might need to write your own merge program, which shouldn't be very complicated.

If you have too many chunks to merge efficiently, you could merge chunks 1..10 together to chunk a, then merge chunks 11..20 to chunk b (again you can do that on multiple machines in parallel), and finally merge chunks a..z. But I doubt this is really needed.

how can i sort in bash the first column but ignore any other column in order

Use the stable sort:

sort -nsk1,1
  • -n sort numerically
  • -k1,1 sorts by the first column ("from the first to the first")
  • -s means "stable", i.e. keep the input order in case of a draw

Note that not all implementations of sort support the -s, as it's not mentioned in the POSIX specification.

GNU `sort` command fails to sort with stable and (general) numeric sorting turned on

Numeric sort sorts by the longest numeric prefix of the sort field, ignoring leading whitespace. The numeric prefix is allowed to be empty: "An empty digit string shall be treated as zero".

Stable sort retains the original order for lines whose keys compare equal, so if you stable numeric sort lines not starting with numbers, the output will be identical to the input.

The quote above is from the Posix standard; the full documentation for gnu sort can be found with info sort if documentation is correctly installed on your machine, or via the url at the bottom of the sort manpage, from which I extracted this link to the -n option.

When and why do i need a 's' flag for sorting by a column

From man sort:

-s, --stable
stabilize sort by disabling last-resort comparison

In other words, if sort finds that two lines have equal keys, it will compare the entire lines to sort them. You disabled this last comparison with -s so you kept the original order.

https://en.wikipedia.org/wiki/Sorting_algorithm#Stability

How to sort a text file by column and keep the original order

Option -s is what you need (equivalent to --stable ):

sort -k11,11 -d -s myfile.txt > sortedfile

The option -k works with a range of fields, so you should probably add ,11 as I did above, otherwise the sorting will use keys spanning from column 11 to the end of line (default).

Preserve original order if numeric value is equal in coreutils sort?

Just add the -s (stable sort) flag, this disables last-resort comparison

echo '7 a
3 c
3 b
2 first
2 second
2 third
2 fourth
2 fifth
9 d
2 sixth
' | sort -k 1,1n -s

2 first
2 second
2 third
2 fourth
2 fifth
2 sixth
3 c
3 b
7 a
9 d

Natural sorting in reverse?

I think you just need to specify that the numeric reverse sort only applies to the first field:

$ sort -k1,1nr file
6 aaa
4 bbb
2 ccc
2 ddd

-k1,1[OPTS] means that OPTS only apply between the 1st and 1st field. The rest of the line is sorted according to global ordering options. In this case, since no other options were passed, this means the default lexicographic sort.



Related Topics



Leave a reply



Submit