Natural Sort with SQL Server

Natural (human alpha-numeric) sort in Microsoft SQL 2005

Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.

Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:

  • will sort the first 1,000 characters or so correctly (easily modified in code or made into a parameter)
  • properly sorts decimals, so 123.333 comes before 123.45
  • because of above, will likely NOT sort things like IP addresses correctly; if you wish different behaviour, modify the code
  • supports sorting a string with an arbitrary number of numbers within it
  • will correctly sort numbers up to 25 digits long (easily modified in code or made into a parameter)

The code is here:

using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public class UDF
{
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
public static SqlString Naturalize(string val)
{
if (String.IsNullOrEmpty(val))
return val;

while(val.Contains(" "))
val = val.Replace(" ", " ");

const int maxLength = 1000;
const int padLength = 25;

bool inNumber = false;
bool isDecimal = false;
int numStart = 0;
int numLength = 0;
int length = val.Length < maxLength ? val.Length : maxLength;

//TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
var sb = new StringBuilder();
for (var i = 0; i < length; i++)
{
int charCode = (int)val[i];
if (charCode >= 48 && charCode <= 57)
{
if (!inNumber)
{
numStart = i;
numLength = 1;
inNumber = true;
continue;
}
numLength++;
continue;
}
if (inNumber)
{
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
inNumber = false;
}
isDecimal = (charCode == 46);
sb.Append(val[i]);
}
if (inNumber)
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));

var ret = sb.ToString();
if (ret.Length > maxLength)
return ret.Substring(0, maxLength);

return ret;
}

static string PadNumber(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}

To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:

CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(@val as nvarchar(max)) RETURNS nvarchar(1000)
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go

Then, you can use it like so:

select *
from MyTable
order by dbo.Naturalize(MyTextField)

Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.

Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!

Edit: Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.

Natural sort for SQL Server?

IsNumeric is "broken", ISNUMERIC(CHAR(13)) returns 1 and CAST will fail.

Use ISNUMERIC(textval + 'e0'). Final code:

ORDER BY
PropertyName,
CASE ISNUMERIC(MixedField + 'e0') WHEN 1 THEN 0 ELSE 1 END, -- letters after numbers
CASE ISNUMERIC(MixedField + 'e0') WHEN 1 THEN CAST(MixedField AS INT) ELSE 0 END,
MixedField

You can mix order parameters...

Natural Sorting SQL ORDER BY

This will do it:

SELECT value
FROM Table1
ORDER BY value REGEXP '^[A-Za-z]+$'
,CAST(value as SIGNED INTEGER)
,CAST(REPLACE(value,'-','')AS SIGNED INTEGER)
,value

The 4 levels of the ORDER BY:

  1. REGEXP assigns any alpha line a 1 and non-alphas a 0
  2. SIGNED INT Sorts all of the numbers by the portion preceding the dash.
  3. SIGNED INT after removing the dash sorts any of the items with the same value before the dash by the portion after the dash. Potentially could replace number 2, but wouldn't want to treat 90-1 the same as 9-01 should the case arise.
  4. Sorts the letters alphabetically.

Demo: SQL Fiddle

Natural sort supporting big numbers

It works like @clemens suggested. Use numeric (= decimal) in the composite type:

CREATE TYPE ai AS (a text, i numeric);

db<>fiddle here

The reason I used int in the referenced answer is performance.

Natural sort with SQL Server

After great trial, I succeeded to solve it with the following way..

SELECT ptnt_vst_csno 
FROM table_name
ORDER BY Substring(ptnt_vst_csno, 1, Charindex('P', ptnt_vst_csno)),
CONVERT(INT, Substring(Substring(ptnt_vst_csno,
Charindex('P', ptnt_vst_csno),
Len(
ptnt_vst_csno)), 2, Len(
ptnt_vst_csno)))

Collation for the natural order of strings by a number they contain

If your files have the name in format name###file you can sort it using

SELECT * FROM @Table ORDER BY LEN(Name), Name

This sorting is simple, first sort by length of Name then by Name. Your file name is constant and only number part changed, so "5", "1" and "2" are "before "10" based on length. Second ordering gives correct order between number in the same magnitude (0-9) (10-99) (100-999) and so on.

Keep in mind that it is not perfect general solution for example: "z" < "aa".

How to natural sort “X-Y” string data, first by X and then by Y?

Assuming the parts on either side of the - are limited to 2 digits and 5 digits respectively, you can extract the two numeric values using SUBSTR (and LOCATE to find the - between the two numbers) and then LPAD to pad each of those values out to 2 and 5 digits to allow them to be sorted numerically:

SELECT *
FROM data
ORDER BY LPAD(SUBSTR(id, 2, LOCATE('-', id) - 2), 2, '0') DESC,
LPAD(SUBSTR(id, LOCATE('-', id) + 1), 5, '0') DESC

Output (for my expanded sample):

id
W20-12457
W20-4617
W20-100
W19-1040
W18-40461
W4-2017

Demo on db-fiddle

If the values can have more than 2 or 5 digits respectively, just change the second parameters to LPAD to suit.



Related Topics



Leave a reply



Submit