Humanized or Natural Number Sorting of Mixed Word-And-Number Strings

Humanized or natural number sorting of mixed word-and-number strings

Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.

Register a composite type made up of one text and one integer value once per database. I call it ai:

CREATE TYPE ai AS (a text, i int);

The trick is to form an array of ai from each value in the column.

regexp_matches() with the pattern (\D*)(\d*) and the g option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}' Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings ('') with 0 in the integer component (as '' cannot be cast to integer).

NULL values sort first - or you have to special case them - or use the whole shebang in a STRICT function like @Craig proposes.

Postgres 9.4 or later

SELECT data
FROM alnum
ORDER BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai
FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)
, data;

db<>fiddle here

Postgres 9.1 (original answer)

Tested with PostgreSQL 9.1.5, where regexp_replace() had a slightly different behavior.

SELECT data
FROM (
SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x
FROM alnum
) x
GROUP BY ctid, data -- ctid as stand-in for a missing pk
ORDER BY regexp_replace (left(data, 1), '[0-9]', '0')
, array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)
, data -- for special case of trailing 0

Add regexp_replace (left(data, 1), '[1-9]', '0') as first ORDER BY item to take care of leading digits and empty strings.

If special characters like {}()"', can occur, you'd have to escape those accordingly.

@Craig's suggestion to use a ROW expression takes care of that.


BTW, this won't execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:

Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is
not yet implemented.

This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1

Natural sort supporting big numbers

It works like @clemens suggested. Use numeric (= decimal) in the composite type:

CREATE TYPE ai AS (a text, i numeric);

db<>fiddle here

The reason I used int in the referenced answer is performance.

Is there a way for sorting numbers with leading zeros like strings otherwise like numbers?

You can sort by the nomber of leading zeros, then by the numeric value:

ORDER BY
length(col)
- length(trim(LEADING '0' FROM col))
DESC,
col COLLATE natural_coll

Sorting strings containing numbers in a user friendly way

Jeff wrote up an article about this on Coding Horror. This is called natural sorting, where you effectively treat a group of digits as a single "character". There are implementations out there in every language under the sun, but strangely it's not usually built-in to most languages' standard libraries.

Compare strings by natural order but ignoring string's prefix

You're answering your own question: With a comparator.

Comparator marcosPrefixIgnoringComparison =
(a, b) -> a.substring(4).compareTo(b.substring(4));

That's assuming that the prefix is defined as 'the first 4 characters'. If it's more 'The string let, and then any number of digits', you'd have to do something else. Possibly regexes:

Comparator marcosPrefixIgnoringComparison =
(a, b) -> a.replaceFirst("^let\\d+\\s+", "").compareTo(
b.replaceFirst("^let\\d+\\s+", ""));

your question is not particularly clear about what 'prefix' means, here.

Collation for the natural order of strings by a number they contain

If your files have the name in format name###file you can sort it using

SELECT * FROM @Table ORDER BY LEN(Name), Name

This sorting is simple, first sort by length of Name then by Name. Your file name is constant and only number part changed, so "5", "1" and "2" are "before "10" based on length. Second ordering gives correct order between number in the same magnitude (0-9) (10-99) (100-999) and so on.

Keep in mind that it is not perfect general solution for example: "z" < "aa".

PostgreSQL ORDER BY issue - natural sort

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9.
You could solve it like this:

SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;

It would be more efficient to drop the redundant 'EM' from your em_code - if you can - and save an integer number to begin with.

Answer to question in comment

To strip any and all non-digits from a string:

SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;

\D is the regular expression class-shorthand for "non-digits".

'g' as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.

After replacing every non-digit with the empty string, only digits remain.

Combine alphabetical and natural order (aka. User sane sorting)

If you use the Comparator suggested by @millimoose (http://www.davekoelle.com/alphanum.html) modify it to pass the Collator

public class AlphanumComparator implements Comparator
{
private Collator collator;
public AlphanumComparator(Collator collator) {
this.collator = collator;
}
.....
public int compare(Object o1, Object o2)
{
......
result = thisChunk.compareTo(thatChunk); //should become
collator.compare(thisChuck, thatChuck);
....

this code seems to have a problem, for example "01" is grater then "2". But this depends on you preference, if this is important modify it to skip the leading zeros before number compare.

Ordering VARCHARs that contain numbers

Use a regex to separate names from numbers

SELECT *
FROM Shelves
ORDER BY
regexp_replace(name , '[^a-zA-Z]*', '', 'g') ,
regexp_replace(name , '[^0-9]*', '', 'g')::INT


Related Topics



Leave a reply



Submit