SQL Queries on String Columns - Sorting According to Language

Sql queries on string columns - sorting according to language

This would sort any name with a Hebrew character on top, and in Hebrew order. The other names would be sorted in English order:

select  *
from YourTable
order by
case
when name like '%[... all hebrew letters here ...]%' then name
end collate Hebrew_CI_AS
, name collate Latin1_General_CI_AS

In a collation, CI means Case Insensitive. AS means Accent Sensitive.

SQL Order By Specific String in a Column

You can try something like this:

Example data:

select * from test;
+--------+----------------------------------------+
| pid | picture |
+--------+----------------------------------------+
| 169799 | 2017/March/18/169799_4_iMbw7.jpg |
| 169799 | 2017/February/18/169799_2_jadsflkjasdf |
| 169799 | 2017/June/06/169799_1_jasfd;ads |
| 169799 | 2017/May/18/169799_5_jasfd;ads |
| 169799 | 2017/June/12/169799_10_jasfd;ads |
| 169799 | 2017/January/12/169799_3_iMbw7.jpg |
+--------+----------------------------------------+

Sort by date

select * from test order by str_to_date(substring_index(picture,'/',3), '%Y/%M/%d') desc;
+--------+----------------------------------------+
| pid | picture |
+--------+----------------------------------------+
| 169799 | 2017/June/12/169799_10_jasfd;ads |
| 169799 | 2017/June/06/169799_1_jasfd;ads |
| 169799 | 2017/May/18/169799_5_jasfd;ads |
| 169799 | 2017/March/18/169799_4_iMbw7.jpg |
| 169799 | 2017/February/18/169799_2_jadsflkjasdf |
| 169799 | 2017/January/12/169799_3_iMbw7.jpg |
+--------+----------------------------------------+

Explanation:

  • substring_index(picture,'/',3) will split text by / and output the first 3 item
  • str_to_date(..., '%Y/%M/%d') converts the result to a date based on the given format. %M is month name

EDIT

Sort by number

select * from test
order by cast(
substring_index(substring_index(picture, '_', 2), '_', -1)
as unsigned
);
+--------+----------------------------------------+
| pid | picture |
+--------+----------------------------------------+
| 169799 | 2017/June/06/169799_1_jasfd;ads |
| 169799 | 2017/February/18/169799_2_jadsflkjasdf |
| 169799 | 2017/January/12/169799_3_iMbw7.jpg |
| 169799 | 2017/March/18/169799_4_iMbw7.jpg |
| 169799 | 2017/May/18/169799_5_jasfd;ads |
| 169799 | 2017/June/12/169799_10_jasfd;ads |
+--------+----------------------------------------+

Explanation:

  • substring_index(picture, '_', 2) will split text by _ and output the first 2 item
  • substring_index(..., '_', -1) picks the last item, which is the number

References:

  • https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_substring-index
  • https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_str-to-date
  • https://dev.mysql.com/doc/refman/5.7/en/date-and-time-functions.html#function_date-format

How to sort the words of a single cell in an SQL table?

So there's nothing that you can do natively. If you want to sort the values just as a return value, i.e. not update the database itself, you can transform the results with either a stored procedure or perhaps a view.

So let's construct an answer.

Let's just assume you want to do it visually, for a single row. If you have SQL 2016 you can use STRING_SPLIT but SQL Fiddle doesn't, so I used a common UDF fnSplitString

http://sqlfiddle.com/#!6/7194d/2

SELECT value
FROM fnSplitString('Pillars 101 in an apartment', ' ')
WHERE RTRIM(value) <> '';

That gives me each word, split out. What about ordering it?

SELECT value
FROM fnSplitString('Pillars 101 in an apartment', ' ')
WHERE RTRIM(value) <> ''
ORDER BY value;

And if I want to do it for each row in the DB table I have? http://sqlfiddle.com/#!6/7194d/8


SELECT split.value
FROM [Data] d
CROSS APPLY dbo.fnSplitString(IsNull(d.Value,''), ' ') AS split
WHERE RTRIM(split.value) <> ''
ORDER BY value;

That's sort of helpful, except now all my words are jumbled. Let's go back to our original query and identify each row. Each row probably has an Identity column on it. If so, you've got your grouping there. If not, you can use ROW_NUMBER, such as:

SELECT
ROW_NUMBER() OVER(ORDER BY d.Value) AS [Identity] -- here, use identity instead of row_number
, d.Value
FROM [Data] d

If we then use this query as a subquery in our select, we get:

http://sqlfiddle.com/#!6/7194d/21

SELECT d.[Identity], split.value
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.Value) AS [Identity] -- here, use identity instead of row_number
, d.Value
FROM [Data] d
) d
CROSS APPLY dbo.fnSplitString(IsNull(d.Value,''), ' ') AS split
WHERE RTRIM(split.value) <> ''
ORDER BY d.[Identity], value;

This query now sorts all rows within each identity. But now you need to reconstruct those individual words back into a single string, right? For that, you can use STUFF. In my example I use a CTE because of SQL Fiddle limitations but you could use a temp table, too.


WITH tempData AS (
SELECT d.[Identity], split.value
FROM
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.Value) AS [Identity] -- here, use identity instead of row_number
, d.Value
FROM [Data] d
) d
CROSS APPLY dbo.fnSplitString(IsNull(d.Value,''), ' ') AS split
WHERE RTRIM(split.value) <> ''
)
SELECT grp.[Identity]
, STUFF((SELECT N' ' + [Value] FROM tempData WHERE [Identity] = grp.[Identity] ORDER BY Value FOR XML PATH(N''))
, 1, 1, N'')
FROM (SELECT DISTINCT [Identity] FROM tempData) AS grp

Here's the end result fiddle: http://sqlfiddle.com/#!6/7194d/27

As expressed in comments already, this is not a common case for SQL. It's an unnecessary burden on the server. I would recommend pulling data out of SQL and sorting it through your programming language of choice; or making sure it's sorted as you insert it into the DB. I went through the exercise because I had a few minutes to kill :)

SQL Server query order by column containing string

To do it in SQL you could use

SELECT *
FROM myTable
ORDER BY CASE
WHEN mycolumn LIKE '%XYZ%' THEN 0
ELSE 1
END,
mycolumn

ORDER BY in SQL where column is synthetic string with embedded integer

Fixed width rep and it uses only functions available in both H2 (not tagged) and SQLS (tagged):

SELECT 
CONCAT(
CAST(name as CHAR(10)), --right pad to 10,

YEAR(date_of_entry),
RIGHT(CONCAT('0',MONTH(date_of_entry)),2),
RIGHT(CONCAT('0',DAY(date_of_entry)),2), --yyyymmdd

CAST(flag1 as CHAR(1)), --rpad to 1, doesn't need cast if never null/0 length

CAST(flag2 as CHAR(1)), --maybe doesn't need cast, see above

RIGHT(CONCAT('0000000000', CAST(salary AS INT)),10), --lpad with 0 to 10 wide

CAST(flag3 as CHAR(1)), --maybe doesn't need cast, see above

RIGHT(CONCAT('0000000000', id), 10) --lpad with 0 to 10 wide

) AS SYNTHETIC_ORDER
FROM
TEST
ORDER BY
SYNTHETIC_ORDER DESC

Points of note:

  • Your CREATE TABLE statement doesn't mention ID, but your query does; included ID

  • Your query doesn't mention NAME but your example data output does; included NAME

  • You might not need to pad the ID or salary so much

  • Come of the casts to chars (e.g. on flag columns) can be dropped (if the flag column is 100% guaranteed to always be 1 char long)

  • If salary max value in table is larger than an int can hold, consider a cast to something else

  • By padding the salary with leading zeroes, the sort will work out. Normalising it to between 0 and 1 could also work, if all the values were padded out to the same width but you possibly then get the problem that loss of precision (dividing a 10 digit salary down to eg 0.123456) will cause two different salaries to merge because there aren't enough digits to fully represent. With any division-that-quantizes-to-lower precision you then risk the original values sorting wrongly (e.g. If salaries of 1000000000 and 1000000001 with id of 2 and 1 respectively both normalise to 0.123456 they would end up sorted wrongly. To guard against this you probably need as many digits for the division answer as the salary had in the first place, padded to a fixed width, but if you've gone that far you might as well just pad all the salaries out either to the width of the widest or to some width that will contain them all. Here utilising a cast to an int might be handy, if the int will overflow. You can make a decision to pad to one digit wider than an int will hold and then if someone inserts a large value in future and your query starts failing because of overflow it at least won't silently deliver wrong results because the pad is chopping digits off the left hand edge. In addressing the cast to bit you can choose whether to add some logic that pads out to the LENGTH() of the string form of the SELECT MAX salary

CONCAT is nice cos you can pass most types to it without first casting to varchar, and it doesn't null the whole thing if you concat a null on, unlike regular string concat ops with + or ||

Wrong sort order string with ک letter in SQL Server sort order

It's possible that your fields have different collations than you think. Run this query to see what do you have in your DB. Are they exactly Persian_100_CI_AS?

DECLARE @defaultCollation NVARCHAR(1000)
SET @defaultCollation = CAST(
DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS NVARCHAR(1000)
)

SELECT C.Table_Name,
Column_Name,
Collation_Name,
@defaultCollation DefaultCollation
FROM Information_Schema.Columns C
INNER JOIN Information_Schema.Tables T
ON C.Table_Name = T.Table_Name
WHERE T.Table_Type = 'Base Table'
AND RTRIM(LTRIM(Collation_Name)) <> RTRIM(LTRIM(@defaultCollation))
AND COLUMNPROPERTY(OBJECT_ID(C.Table_Name), Column_Name, 'IsComputed') = 0
ORDER BY
C.Table_Name,
C.Column_Name

If the collations are different, run this query which gives you Alter statements to run them and fix the incompatible collations issue.

DECLARE @defaultCollation NVARCHAR(1000)
SET @defaultCollation = CAST(
DATABASEPROPERTYEX(DB_NAME(), 'Collation') AS NVARCHAR(1000)
)

select 'ALTER TABLE '
+ QUOTENAME(C.TABLE_SCHEMA)
+'.'+ QUOTENAME(C.Table_Name)
+' ALTER COLUMN ' +' [' +Column_Name+'] '
+ DATA_TYPE+'('+cast(character_maximum_length as varchar(10))+')' +' COLLATE Persian_100_CI_AS '
+(case IS_NULLABLE when 'YES' then 'NULL' else 'NOT NULL' end )+';'

FROM Information_Schema.Columns C
INNER JOIN Information_Schema.Tables T
ON C.Table_Name = T.Table_Name
WHERE T.Table_Type = 'Base Table'
AND RTRIM(LTRIM(Collation_Name)) <> RTRIM(LTRIM(@defaultCollation))
--AND DATA_TYPE='nvarchar'
AND character_maximum_length>0
ORDER BY
C.Table_Name,
C.Column_Name


Related Topics



Leave a reply



Submit