Humanized or natural number sorting of mixed word-and-number strings
Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.
Register a composite type made up of one text
and one integer
value once per database. I call it ai
:
CREATE TYPE ai AS (a text, i int);
The trick is to form an array of ai
from each value in the column.
regexp_matches()
with the pattern (\D*)(\d*)
and the g
option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}'
Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings (''
) with 0
in the integer
component (as ''
cannot be cast to integer
).
NULL
values sort first - or you have to special case them - or use the whole shebang in a STRICT
function like @Craig proposes.
Postgres 9.4 or later
SELECT data
FROM alnum
ORDER BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai
FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)
, data;
db<>fiddle here
Postgres 9.1 (original answer)
Tested with PostgreSQL 9.1.5, where regexp_replace()
had a slightly different behavior.
SELECT data
FROM (
SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x
FROM alnum
) x
GROUP BY ctid, data -- ctid as stand-in for a missing pk
ORDER BY regexp_replace (left(data, 1), '[0-9]', '0')
, array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)
, data -- for special case of trailing 0
Add regexp_replace (left(data, 1), '[1-9]', '0')
as first ORDER BY
item to take care of leading digits and empty strings.
If special characters like {}()"',
can occur, you'd have to escape those accordingly.
@Craig's suggestion to use a ROW
expression takes care of that.
BTW, this won't execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:
Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is
not yet implemented.
This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1
Natural sort supporting big numbers
It works like @clemens suggested. Use numeric
(= decimal
) in the composite type:
CREATE TYPE ai AS (a text, i numeric);
db<>fiddle here
The reason I used int
in the referenced answer is performance.
Is there a way for sorting numbers with leading zeros like strings otherwise like numbers?
You can sort by the nomber of leading zeros, then by the numeric value:
ORDER BY
length(col)
- length(trim(LEADING '0' FROM col))
DESC,
col COLLATE natural_coll
Sorting strings containing numbers in a user friendly way
Jeff wrote up an article about this on Coding Horror. This is called natural sorting, where you effectively treat a group of digits as a single "character". There are implementations out there in every language under the sun, but strangely it's not usually built-in to most languages' standard libraries.
Compare strings by natural order but ignoring string's prefix
You're answering your own question: With a comparator.
Comparator marcosPrefixIgnoringComparison =
(a, b) -> a.substring(4).compareTo(b.substring(4));
That's assuming that the prefix is defined as 'the first 4 characters'. If it's more 'The string let
, and then any number of digits', you'd have to do something else. Possibly regexes:
Comparator marcosPrefixIgnoringComparison =
(a, b) -> a.replaceFirst("^let\\d+\\s+", "").compareTo(
b.replaceFirst("^let\\d+\\s+", ""));
your question is not particularly clear about what 'prefix' means, here.
Collation for the natural order of strings by a number they contain
If your files have the name in format name###file you can sort it using
SELECT * FROM @Table ORDER BY LEN(Name), Name
This sorting is simple, first sort by length of Name then by Name. Your file name is constant and only number part changed, so "5", "1" and "2" are "before "10" based on length. Second ordering gives correct order between number in the same magnitude (0-9) (10-99) (100-999) and so on.
Keep in mind that it is not perfect general solution for example: "z" < "aa".
PostgreSQL ORDER BY issue - natural sort
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1
sorts before 9
.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code
- if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D
is the regular expression class-shorthand for "non-digits".'g'
as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
Combine alphabetical and natural order (aka. User sane sorting)
If you use the Comparator suggested by @millimoose (http://www.davekoelle.com/alphanum.html) modify it to pass the Collator
public class AlphanumComparator implements Comparator
{
private Collator collator;
public AlphanumComparator(Collator collator) {
this.collator = collator;
}
.....
public int compare(Object o1, Object o2)
{
......
result = thisChunk.compareTo(thatChunk); //should become
collator.compare(thisChuck, thatChuck);
....
this code seems to have a problem, for example "01" is grater then "2". But this depends on you preference, if this is important modify it to skip the leading zeros before number compare.
Ordering VARCHARs that contain numbers
Use a regex to separate names from numbers
SELECT *
FROM Shelves
ORDER BY
regexp_replace(name , '[^a-zA-Z]*', '', 'g') ,
regexp_replace(name , '[^0-9]*', '', 'g')::INT
Related Topics
Postgres Not Allowing Localhost But Works with 127.0.0.1
Postgresql Parameterized Order By/Limit in Table Function
Accounting for Dst in Postgres, When Selecting Scheduled Items
What's the Best Way to Select the Minimum Value from Several Columns
Sqlite Database Default Time Value 'Now'
How to Output a Select Statement from a Pl/SQL Block
SQL Server 2008: How to Grant Privileges to a Username
How to Use Distinct and Order by in Same Select Statement
Coalesce Alternative in Access SQL
Using 'Case Expression Column' in Where Clause
Delete All Rows in a Table Based on Another Table
Inner Join VS Natural Join VS Using Clause: Are There Any Advantages
Storing Money in a Decimal Column - What Precision and Scale