Natural sort supporting big numbers
It works like @clemens suggested. Use numeric
(= decimal
) in the composite type:
CREATE TYPE ai AS (a text, i numeric);
db<>fiddle here
The reason I used int
in the referenced answer is performance.
Natural sorting when characters and numbers mixed
You need to normalise the format of the numeric portion of the text. You can do that by splitting the string into the AB
prefix and the numeric part, then left-padding the numeric part to a consistent length with zeroes.
For example: AB11a becomes AB00011a.
Apply this to all the items you've listed and they'll sort in the order you want.
You can do this with
... ORDER BY concat(substring(`code`,1,2),lpad(substr(`code`,3),6,'0')) ...
where `code` is the name of the column that contains the data you want to sort.
Note - this assumes that the prefix is always 2 characters.
Natural Sort on data mixing text and numbers (and then more text *sometimes*)
Through pure SQL? Not easy!
We did this in code on my project, and basically, you apply a regular expression that matches runs of digits or non-digits (something like ([0-9]+)|([^0-9]+)
), turns a string into a tuple of such runs, converts digit runs into integers, then sorts the tuples with a pairwise comparison. For example:
"Lot 1" -> ("Lot ", 1)
"Lot 2" -> ("Lot ", 2)
"Lot 3" -> ("Lot ", 3)
"Lot 3a" -> ("Lot ", 3, "a")
"Lot 100" -> ("Lot ", 100)
The pairwise comparison then (a) sorts integers before strings, (b) sorts integers in natural order, and (c) sorts strings in natural order.
If you were able to apply regular expressions to columns for sorting, you could use one to 'normalise' the numbers into digit strings of a fixed length, padded with zeroes. Like this:
"Lot 1" -> "Lot 0001"
"Lot 2" -> "Lot 0002"
"Lot 3" -> "Lot 0003"
"Lot 3a" -> "Lot 0003a"
"Lot 100" -> "Lot 0100"
Which should sort correctly. However, whilst you can do this in Oracle and some other databases, in MySQL, you need to use a user-defined function.
However, you could do it outside the database. Add a column to the lot table to store the normalised lot name, and whenever you insert a row from your code, generate the normalised form and store it. You can then sort using that column.
F# - Natural Sort Of Strings Containing Numbers
Based on @matekus' comment, possibly the most correct solution is to port the AlphaNum sort to F#, so:
let len = String.length
let isnum (s: string) i =
let c = s.Chars i
c >= '0' && c <= '9'
let chunk s f t = (f < len s) && (t < len s) && (isnum s f) = (isnum s t)
let chunkto s f =
let rec to_ s f e = if chunk s f e then to_ s f (e + 1) else e in to_ s f f
let int_of_string str =
let v = ref 0
if System.Int32.TryParse(str, v) then !v else 0
let alphanumcmp a b =
let rec chunkcmp a ai b bi =
let al, bl = len a, len b
if ai >= al || bi >= bl then compare al bl else
let ae, be = chunkto a ai, chunkto b bi
let sa, sb = a.Substring(ai, (ae-ai)), b.Substring(bi, (be-bi))
let cmp = if isnum a ai && isnum b bi then compare (int_of_string sa) (int_of_string sb) else compare sa sb
if cmp = 0 then chunkcmp a ae b be else cmp
in chunkcmp a 0 b 0
type AlphanumComparer() =
interface System.Collections.IComparer with
member this.Compare(x, y) =
alphanumcmp (x.ToString()) (y.ToString())
Test:
let names = [ "1000X Radonius Maximus"; "10X Radonius"; "200X Radonius"; "20X Radonius"; "20X Radonius Prime"; "30X Radonius"; "40X Radonius"; "Allegia 50 Clasteron"; "Allegia 500 Clasteron"; "Allegia 51 Clasteron"; "Allegia 51B Clasteron"; "Allegia 52 Clasteron"; "Allegia 60 Clasteron"; "Alpha 100"; "Alpha 2"; "Alpha 200"; "Alpha 2A"; "Alpha 2A-8000"; "Alpha 2A-900"; "Callisto Morphamax"; "Callisto Morphamax 500"; "Callisto Morphamax 5000"; "Callisto Morphamax 600"; "Callisto Morphamax 700"; "Callisto Morphamax 7000"; "Callisto Morphamax 7000 SE";"Callisto Morphamax 7000 SE2"; "QRS-60 Intrinsia Machine"; "QRS-60F Intrinsia Machine"; "QRS-62 Intrinsia Machine"; "QRS-62F Intrinsia Machine"; "Xiph Xlater 10000"; "Xiph Xlater 2000"; "Xiph Xlater 300"; "Xiph Xlater 40"; "Xiph Xlater 5"; "Xiph Xlater 50"; "Xiph Xlater 500"; "Xiph Xlater 5000"; "Xiph Xlater 58" ];;
names |> List.sortWith alphanumcmp |> printf "%A"
Results:
["10X Radonius"; "20X Radonius"; "20X Radonius Prime"; "30X Radonius";
"40X Radonius"; "200X Radonius"; "1000X Radonius Maximus";
"Allegia 50 Clasteron"; "Allegia 51 Clasteron"; "Allegia 51B Clasteron";
"Allegia 52 Clasteron"; "Allegia 60 Clasteron"; "Allegia 500 Clasteron";
"Alpha 2"; "Alpha 2A"; "Alpha 2A-900"; "Alpha 2A-8000"; "Alpha 100";
"Alpha 200"; "Callisto Morphamax"; "Callisto Morphamax 500";
"Callisto Morphamax 600"; "Callisto Morphamax 700"; "Callisto Morphamax 5000";
"Callisto Morphamax 7000"; "Callisto Morphamax 7000 SE";
"Callisto Morphamax 7000 SE2"; "QRS-60 Intrinsia Machine";
"QRS-60F Intrinsia Machine"; "QRS-62 Intrinsia Machine";
"QRS-62F Intrinsia Machine"; "Xiph Xlater 5"; "Xiph Xlater 40";
"Xiph Xlater 50"; "Xiph Xlater 58"; "Xiph Xlater 300"; "Xiph Xlater 500";
"Xiph Xlater 2000"; "Xiph Xlater 5000"; "Xiph Xlater 10000"]val it : unit = ()
Humanized or natural number sorting of mixed word-and-number strings
Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.
Register a composite type made up of one text
and one integer
value once per database. I call it ai
:
CREATE TYPE ai AS (a text, i int);
The trick is to form an array of ai
from each value in the column.
regexp_matches()
with the pattern (\D*)(\d*)
and the g
option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}'
Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings (''
) with 0
in the integer
component (as ''
cannot be cast to integer
).
NULL
values sort first - or you have to special case them - or use the whole shebang in a STRICT
function like @Craig proposes.
Postgres 9.4 or later
SELECT data
FROM alnum
ORDER BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai
FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)
, data;
db<>fiddle here
Postgres 9.1 (original answer)
Tested with PostgreSQL 9.1.5, where regexp_replace()
had a slightly different behavior.
SELECT data
FROM (
SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x
FROM alnum
) x
GROUP BY ctid, data -- ctid as stand-in for a missing pk
ORDER BY regexp_replace (left(data, 1), '[0-9]', '0')
, array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)
, data -- for special case of trailing 0
Add regexp_replace (left(data, 1), '[1-9]', '0')
as first ORDER BY
item to take care of leading digits and empty strings.
If special characters like {}()"',
can occur, you'd have to escape those accordingly.
@Craig's suggestion to use a ROW
expression takes care of that.
BTW, this won't execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:
Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is
not yet implemented.
This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1
Natural Sort in MySQL
I think this is why a lot of things are sorted by release date.
A solution could be to create another column in your table for the "SortKey". This could be a sanitized version of the title which conforms to a pattern you create for easy sorting or a counter.
Natural sorting in pandas
Use str.extract
, sort_values
, then use the index to reindex df
.
idx = (df.assign(ID2=df.ID.str.extract(r'(\d+)$').astype(int))
.sort_values(['ID2', 'Time'])
.index)
df.iloc[idx]
ID Time oneMissing singleValue empty oneEmpty
0 CS1-1 1 10000.0 NaN None 0.0
2 CS1-1 2 30000.0 NaN None 0.0
3 CS1-2 1 10000.0 NaN None NaN
1 CS1-2 2 20000.0 0.0 None 0.0
5 CS1-2 3 30000.0 NaN None NaN
4 CS1-11 1 NaN 0.0 None NaN
This is under the assumption that your ID column follows the pattern "XXX-NUMBER".
A fool-proof solution will involve the use of the natsort
module, which excels at fast natural sorting. With a little elbow-grease, we can argsort your data.
from natsort import natsorted
idx, *_ = zip(*natsorted(
zip(df.index, df.ID, df.Time), key=lambda x: (x[1], x[2])))
df.iloc[list(idx)]
ID Time oneMissing singleValue empty oneEmpty
0 CS1-1 1 10000.0 NaN None 0.0
2 CS1-1 2 30000.0 NaN None 0.0
3 CS1-2 1 10000.0 NaN None NaN
1 CS1-2 2 20000.0 0.0 None 0.0
5 CS1-2 3 30000.0 NaN None NaN
4 CS1-11 1 NaN 0.0 None NaN
Use PyPi to install: pip install natsort
.
Is there a built in function for string natural sort?
There is a third party library for this on PyPI called natsort (full disclosure, I am the package's author). For your case, you can do either of the following:
>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE) # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
You should note that natsort
uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out the natsort
documentation's How It Works page, in particular the Special Cases Everywhere! section.
If you need a sorting key instead of a sorting function, use either of the below formulas.
>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
Update November 2020
Given that a popular request/question is "how to sort like Windows Explorer?" (or whatever is your operating system's file system browser), as of natsort
version 7.1.0 there is a function called os_sorted
to do exactly this. On Windows, it will sort in the same order as Windows Explorer, and on other operating systems it should sort like whatever is the local file system browser.
>>> from natsort import os_sorted
>>> os_sorted(list_of_paths)
# your paths sorted like your file system browser
For those needing a sort key, you can use os_sort_keygen
(or os_sort_key
if you just need the defaults).
Caveat - Please read the API documentation for this function before you use to understand the limitations and how to get best results.
Related Topics
How to Sum Multiple Lines in SQL
Date Column Arithmetic in Postgresql Query
Ssis Hidden Sheets as Excel Destination
Row_Number Simulation in SQL Server 2000
Pyodbc Execute Command Not Accepting? Parameters Correctly
How to Check Column Structure in Ssis
Rows to Columns in SQL Server 2000
Sqlite Equivalent of Postgresql's Greatest Function
How to Use a Reserved Word in SQL as a Table Name
How to Run the Same Query Multiple Times Using Loop in Pl/Sql
SQL Server: Self-Reference Fk, Trigger Instead of on Delete Cascade
How to Count the Number of Words in a String in Oracle
Insert Deleted Values into a Table Before Delete with a Delete Trigger