PostgreSQL ORDER BY issue - natural sort
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1
sorts before 9
.
You could solve it like this:
SELECT * FROM employees
ORDER BY substring(em_code, 3)::int DESC;
It would be more efficient to drop the redundant 'EM' from your em_code
- if you can - and save an integer number to begin with.
Answer to question in comment
To strip any and all non-digits from a string:
SELECT regexp_replace(em_code, E'\\D','','g')
FROM employees;
\D
is the regular expression class-shorthand for "non-digits".'g'
as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
After replacing every non-digit with the empty string, only digits remain.
Postgres natural order by
Postgres allow you to sort by arrays -- which is essentially what the version number represents. Hence, you can use this syntax:
order by string_to_array(version, '.')::int[] desc
Here is a full example:
select *
from (values ('1'), ('2.1'), ('1.2.3'), ('1.10.6'), ('1.9.4')) v(version)
order by string_to_array(version, '.')::int[] desc;
And even a demonstration.
Alphanumeric Sorting in PostgreSQL
When sorting character data types, collation rules apply - unless you work with locale "C" which sorts characters by there byte values. Applying collation rules may or may not be desirable. It makes sorting more expensive in any case. If you want to sort without collation rules, don't cast to bytea
, use COLLATE "C"
instead:
SELECT * FROM table ORDER BY column COLLATE "C";
However, this does not yet solve the problem with numbers in the string you mention. Split the string and sort the numeric part as number.
SELECT *
FROM table
ORDER BY split_part(column, '-', 2)::numeric;
Or, if all your numbers fit into bigint
or even integer
, use that instead (cheaper).
I ignored the leading part because you write:
... the basis for ordering is the last whole number of the string, regardless of what the character before that number is.
Related:
- Alphanumeric sorting with PostgreSQL
- Split comma separated column data into additional columns
- What is the impact of LC_CTYPE on a PostgreSQL database?
Typically, it's best to save distinct parts of a string in separate columns as proper respective data types to avoid any such confusion.
And if the leading string is identical for all columns, consider just dropping the redundant noise. You can always use a VIEW
to prepend a string for display, or do it on-the-fly, cheaply.
Natural sort supporting big numbers
It works like @clemens suggested. Use numeric
(= decimal
) in the composite type:
CREATE TYPE ai AS (a text, i numeric);
db<>fiddle here
The reason I used int
in the referenced answer is performance.
Humanized or natural number sorting of mixed word-and-number strings
Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.
Register a composite type made up of one text
and one integer
value once per database. I call it ai
:
CREATE TYPE ai AS (a text, i int);
The trick is to form an array of ai
from each value in the column.
regexp_matches()
with the pattern (\D*)(\d*)
and the g
option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}'
Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings (''
) with 0
in the integer
component (as ''
cannot be cast to integer
).
NULL
values sort first - or you have to special case them - or use the whole shebang in a STRICT
function like @Craig proposes.
Postgres 9.4 or later
SELECT data
FROM alnum
ORDER BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai
FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)
, data;
db<>fiddle here
Postgres 9.1 (original answer)
Tested with PostgreSQL 9.1.5, where regexp_replace()
had a slightly different behavior.
SELECT data
FROM (
SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x
FROM alnum
) x
GROUP BY ctid, data -- ctid as stand-in for a missing pk
ORDER BY regexp_replace (left(data, 1), '[0-9]', '0')
, array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)
, data -- for special case of trailing 0
Add regexp_replace (left(data, 1), '[1-9]', '0')
as first ORDER BY
item to take care of leading digits and empty strings.
If special characters like {}()"',
can occur, you'd have to escape those accordingly.
@Craig's suggestion to use a ROW
expression takes care of that.
BTW, this won't execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:
Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is
not yet implemented.
This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1
What is the best way to replicate PostgreSQL sorting results in JavaScript?
Sorting strings is always done using a certain collation. If you are not conscious of using a collation in your programming language, you are probably using the POSIX
collation, which compares strings character by character according to their code point (the numeric value in the encoding).
In PostgreSQL, that would look like this:
ORDER BY name COLLATE "POSIX";
So to solve your problem, you'd have to find out the collation of the column.
If there is no special collation specified in the column definition, it will use the database's collation, which can be found with
SELECT datcollate FROM pg_database WHERE datname = 'my_database';
That will be an operating system collation from the C library.
So all you have to do is to use that collation in your program.
If your program is written in C, you can directly use the C library. Otherwise, refer to the documentation of your programming language.
PostgreSQL ORDER BY clause on numerical portion of text column
Your id
column is obviously of some text data type so the ordering is alphabetical, not by the number. To get it to work, strip the 'G' from the id
column when ordering:
SELECT * FROM mytable
ORDER BY right(id, -1)::integer;
Natural sorting when characters and numbers mixed
You need to normalise the format of the numeric portion of the text. You can do that by splitting the string into the AB
prefix and the numeric part, then left-padding the numeric part to a consistent length with zeroes.
For example: AB11a becomes AB00011a.
Apply this to all the items you've listed and they'll sort in the order you want.
You can do this with
... ORDER BY concat(substring(`code`,1,2),lpad(substr(`code`,3),6,'0')) ...
where `code` is the name of the column that contains the data you want to sort.
Note - this assumes that the prefix is always 2 characters.
How to order by the the alphabetical sort order of a title (ignoring The, An, etc.) and use an index
You could add an index on an expression:
create index on yourtable (natural_sort(title));
Postgres will then use the index when appropriate, and won't actually calculate natural_sort(title)
when it does -- unless you select that too.
That being said (and much like with tsvector fields) you'll get improved performance if you actually store the pre-calculated result for performance reasons. If, in the above case, Postgres decides to not use that index for any reason, the need to actually calculate it for each and every row considered will be a big drag on your query.
In either case, don't forget numbers:
http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html
Here are two functions to get you started on natural sorting:
/**
* @param text _str The input string.
* @return text The output string for consumption in natural sorting.
*/
CREATE OR REPLACE FUNCTION natsort(text)
RETURNS text
AS $$
DECLARE
_str text := $1;
_pad int := 15; -- Maximum precision for PostgreSQL floats
BEGIN
-- Bail if the string is empty
IF trim(_str) = ''
THEN
RETURN '';
END IF;
-- Strip accents and lower the case
_str := lower(unaccent(_str));
-- Replace nonsensical characters
_str := regexp_replace(_str, E'[^a-z0-9$¢£¥₤€@&%\\(\\)\\[\\]\\{\\}_:;,\\.\\?!\\+\\-]+', ' ', 'g');
-- Trim the result
_str := trim(_str);
-- @todo we'd ideally want to strip leading articles/prepositions ('a', 'the') at this stage,
-- but to_tsvector()'s default dictionary also strips stop words (e.g. 'all').
-- We're done if the string contains no numbers
IF _str !~ '[0-9]'
THEN
RETURN _str;
END IF;
-- Force spaces between numbers, so we can use regexp_split_to_table()
_str := regexp_replace(_str, E'((?:[0-9]+|[0-9]*\\.[0-9]+)(?:e[+-]?[0-9]+\\M)?)', E' \\1 ', 'g');
-- Pad zeros to obtain a reasonably natural looking sort order
RETURN array_to_string(ARRAY(
SELECT CASE
WHEN val !~ E'^\\.?[0-9]'
-- Not a number; return as is
THEN val
-- Do our best after expanding the number...
ELSE COALESCE(lpad(substring(val::numeric::text from '^[0-9]+'), _pad, '0'), '') ||
COALESCE(rpad(substring(val::numeric::text from E'\\.[0-9]+'), _pad, '0'), '')
END
FROM regexp_split_to_table(_str, E'\\s+') as val
WHERE val <> ''
), ' ');
END;
$$ IMMUTABLE STRICT LANGUAGE plpgsql COST 1;
COMMENT ON FUNCTION natsort(text) IS
'Rewrites a string so it can be used in natural sorting.
It''s by no means bullet proof, but it works properly for positive integers,
reasonably well for positive floats, and it''s fast enough to be used in a
trigger that populates an indexed column, or in an index directly.';
/**
* @param text[] _values The potential values to use.
* @return text The output string for consumption in natural sorting.
*/
CREATE OR REPLACE FUNCTION sort(text[])
RETURNS text
AS $$
DECLARE
_values alias for $1;
_sort text;
BEGIN
SELECT natsort(value)
INTO _sort
FROM unnest(_values) as value
WHERE value IS NOT NULL
AND value <> ''
AND natsort(value) <> ''
LIMIT 1;
RETURN COALESCE(_sort, '');
END;
$$ IMMUTABLE STRICT LANGUAGE plpgsql COST 1;
COMMENT ON FUNCTION sort(text[]) IS
'Returns natsort() of the first significant input argument.';
Sample output from the first function's unit tests:
public function testNatsort()
{
$this->checkInOut('natsort', array(
'<NULL>' => null,
'' => '',
'ABCde' => 'abcde',
'12345 12345' => '000000000012345 000000000012345',
'12345.12345' => '000000000012345.123450000000000',
'12345e5' => '000001234500000',
'.12345e5' => '000000000012345',
'1e10' => '000010000000000',
'1.2e20' => '120000000000000',
'-12345e5' => '- 000001234500000',
'-.12345e5' => '- 000000000012345',
'-1e10' => '- 000010000000000',
'-1.2e20' => '- 120000000000000',
'+-$¢£¥₤€@&%' => '+-$¢£¥₤€@&%',
'ÀÁÂÃÄÅĀĄĂÆ' => 'aaaaaeaaaaaae',
'ÈÉÊËĒĘĚĔĖÐ' => 'PostgreSQL ORDER BY issue - natural sort Postgres natural order by Alphanumeric Sorting in PostgreSQL Natural sort supporting big numbers Humanized or natural number soee',
'ÌÍÎÏĪĨĬĮİIJ' => 'iiiiiiiiiij',
'ÒÓÔÕÖØŌŐŎŒ' => 'oooooeoooooe',
'ÙÚÛÜŪŮŰŬŨŲ' => 'uuuueuuuuuu',
'ÝŶŸ' => 'yyy',
'àáâãäåāąăæ' => 'aaaaaeaaaaaae',
'èéêëēęěĕėð' => 'PostgreSQL ORDER BY issue - natural sort Postgres natural order by Alphanumeric Sorting in PostgreSQL Natural sort supporting big numbers Humanized or natural number soee',
'ìíîïīĩĭįıij' => 'iiiiiiiiiij',
'òóôõöøōőŏœ' => 'oooooeoooooe',
'ùúûüūůűŭũų' => 'uuuueuuuuuu',
'ýÿŷ' => 'yyy',
'ÇĆČĈĊ' => 'ccccc',
'ĎĐ' => 'dd',
'Ƒ' => 'f',
'ĜĞĠĢ' => 'gggg',
'ĤĦ' => 'hh',
'Ĵ' => 'j',
'Ķ' => 'k',
'ŁĽĹĻĿ' => 'lllll',
'ÑŃŇŅŊ' => 'nnnnn',
'ŔŘŖ' => 'rrr',
'ŚŠŞŜȘſ' => 'sssssss',
'ŤŢŦȚÞ' => 'ttttt',
'Ŵ' => 'w',
'ŹŽŻ' => 'zzz',
'çćčĉċ' => 'ccccc',
'ďđ' => 'dd',
'ƒ' => 'f',
'ĝğġģ' => 'gggg',
'ĥħ' => 'hh',
'ĵ' => 'j',
'ĸķ' => 'kk',
'łľĺļŀ' => 'lllll',
'ñńňņʼnŋ' => 'nnnnnn',
'ŕřŗ' => 'rrr',
'śšşŝșß' => 'sssssss',
'ťţŧțþ' => 'ttttt',
'ŵ' => 'w',
'žżź' => 'zzz',
'-_aaa--zzz--' => '-_aaa--zzz--',
'-:àáâ;-žżź--' => '-:aaa;-zzz--',
'-.à$â,-ž%ź--' => '-.a$a,-z%z--',
'--à$â--ž%ź--' => '--a$a--z%z--',
'-$à(â--ž)ź%-' => '-$a(a--z)z%-',
'#-à$â--ž?!ź-' => '-a$a--z?!z-',
));
Related Topics
String Concatenation Does Not Work in SQLite
How to Find Rows in One Table That Have No Corresponding Row in Another Table
How to Pass a List as a Parameter in a Stored Procedure
Closing Connection When Using Dapper
Extracting Hours from a Datetime (SQL Server 2005)
Combine Two Tables for One Output
How to Debug Ora-01775: Looping Chain of Synonyms
Get Month Name from Date in Oracle
How to Determine the Status of a Job
Formula for Computed Column Based on Different Table's Column
SQL Server Update Trigger, Get Only Modified Fields
Anonymous Table or Varray Type in Oracle
Name Database Design Notation You Prefer and Why
Return Number of Rows Affected by Update Statements
Column Name or Number of Supplied Values Does Not Match Table Definition