SQL : find rows and sort according to number of matching columns?
There are probably a few ways to optimise the sub-queries, but without using case
statements or sub-optimal join clauses:
select
*
from
(
select
selection.CarId,
selection.Colour,
selection.Weight,
selection.Type,
3 as Relevance
from
tblCars as selection
where
selection.Colour = 'black' and selection.Weight = 'light' and selection.Type = 'van'
union all
select
cars.CarId,
cars.Colour,
cars.Weight,
cars.Type,
count(*) as Relevance
from
tblCars as cars
inner join
(
select
byColour.CarId
from
tblCars as cars
inner join
tblCars as byColour
on
cars.Colour = byColour.Colour
where
cars.Colour = 'black' and cars.Weight = 'light' and cars.Type = 'van'
and
byColour.CarId <> cars.CarId
union all
select
byWeight.CarId
from
tblCars as cars
inner join
tblCars as byWeight
on
cars.Weight = byWeight.Weight
where
cars.Colour = 'black' and cars.Weight = 'light' and cars.Type = 'van'
and
byWeight.CarId <> cars.CarId
union all
select
byType.CarId
from
tblCars as cars
inner join
tblCars as byType
on
cars.Type = byType.Type
where
cars.Colour = 'black' and cars.Weight = 'light' and cars.Type = 'van'
and
byType.CarId <> cars.CarId
) as matches
on
cars.CarId = matches.CarId
group by
cars.CarId,
cars.Colour,
cars.Weight,
cars.Type
) as results
order by
Relevance desc
Output:
CarId Colour Weight Type Relevance
1 black light van 3
3 white light van 2
4 blue light van 2
5 black medium van 2
6 white medium van 1
7 blue medium van 1
8 black heavy limo 1
Sort column values to match order of values in another table column
So you need to update Column2
with the row-number according toColumn1
?
You can use ROW_NUMBER
and a CTE:
WITH CTE AS
(
SELECT Column1, Column2, RN = ROW_NUMBER() OVER (ORDER BY Column1)
FROM MyTable
)
UPDATE CTE SET Column2 = RN;
This updates the table MyTable
and works because the CTE selects a single table. If it contains more than one table you have to JOIN
the UPDATE
with the CTE
.
Demo
Sort by first matching number then by second matching number and so on in SQL
Assuming you would have 2 number blocks at most and each number would be 10 digits at most, I created a sample CLR UDF like this for you (DbProject - SQL CLR Database project):
using System.Collections.Generic;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString CustomStringParser(SqlString str)
{
int depth = 2; // 2 numbers at most
int width = 10; // 10 digits at most
List<string> numbers = new List<string>();
var matches = Regex.Matches((string)str, @"\d+");
foreach (Match match in matches)
{
numbers.Add(int.Parse(match.Value).ToString().PadLeft(width, '0'));
}
return string.Join("", numbers.ToArray()).PadRight(depth*width);
}
}
I added this to the 'test' database as follows:
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[ufn_MyCustomParser]') AND
type IN ( N'FN', N'IF', N'TF', N'FS', N'FT' ) )
DROP FUNCTION [dbo].[ufn_MyCustomParser]
GO
IF EXISTS ( SELECT *
FROM sys.[assemblies] AS [a]
WHERE [a].[name] = 'DbProject' AND
[a].[is_user_defined] = 1 )
DROP ASSEMBLY DbProject;
GO
CREATE ASSEMBLY DbProject
FROM 'C:\SQLCLR\DbProject\DbProject\bin\Debug\DbProject.dll'
WITH PERMISSION_SET = SAFE;
GO
CREATE FUNCTION ufn_MyCustomParser ( @csv NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS EXTERNAL NAME
DbProject.[UserDefinedFunctions].CustomStringParser;
GO
Note: SQL server 2012 (2017 has strict security problem that you need to handle).
Finally tested with this T-SQL:
declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_01'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2');
SELECT * FROM @MyTable
ORDER BY dbo.ufn_MyCustomParser(col1);
Output:
col1
QARTa1b2
Btc0004_Shd_4
Btc0007_Shd_01
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
Btc_Bwwwwe12541edddddtc0605
SQL multiple words search, ordered by number of matches
ORDER BY
(
CASE
WHEN col LIKE '%red%' THEN 1
ELSE 0
END CASE
+
CASE
WHEN col LIKE '%green%' THEN 1
ELSE 0
END CASE
+
CASE
WHEN col LIKE '%blue%' THEN 1
ELSE 0
END CASE
) DESC
If your DB vendor has IF
, you can use it instead of CASE
(e.g., for Mysql you can write
IF (col LIKE '%red% , 1,0) + IF(....'
Sort SQL records based on matched conditions
.... ORDER BY CASE
WHEN key LIKE '1,2,3,%' THEN 1
WHEN key LIKE '1,2,%' THEN 2
ELSE 3
END
How can I return the best matched row first in sort order from a set returned by querying a single search term against multiple columns in Postgres?
Use greatest()
:
greatest(similarity('12345', foo_text), similarity('12345', bar_text), similarity('12345', foobar_text)) desc
SQL query to find rows with the most matching keywords
Like @a_horse commented: This would be simpler with a normalized design (besides making other tasks simpler/ cleaner), but still not trivial.
Also, a PK column of data type character varying(36)
is highly suspicious (and inefficient) and should most probably be an integer
type or at least a uuid
instead.
Here is one possible solution based on your design as is:
WITH cte AS (
SELECT id, string_to_array(a.keywords, ',') AS keys
FROM article a
)
SELECT id, string_agg(b_id, ',') AS best_matches
FROM (
SELECT a.id, b.id AS b_id
, row_number() OVER (PARTITION BY a.id ORDER BY ct.ct DESC, b.id) AS rn
FROM cte a
LEFT JOIN cte b ON a.id <> b.id AND a.keys && b.keys
LEFT JOIN LATERAL (
SELECT count(*) AS ct
FROM (
SELECT * FROM unnest(a.keys)
INTERSECT ALL
SELECT * FROM unnest(b.keys)
) i
) ct ON TRUE
ORDER BY a.id, ct.ct DESC, b.id -- b.id as tiebreaker
) sub
WHERE rn < 4
GROUP BY 1;
sqlfiddle (using an integer id
instead).
The CTE cte
converts the string into an array. You could even have a functional GIN index like that ...
If multiple rows tie for the top 3 picks, you need to define a tiebreaker. In my example, rows with smaller id
come first.
Detailed explanation in this recent related answer:
- Query and order by number of matches in JSON array
The comparison is between a JSON array and an SQL array, but it's basically the same problem, burns down to the same solution(s). Also comparing a couple of similar alternatives.
To make this fast, you should at least have a GIN index on the array column (instead of the comma-separated string) and the query wouldn't need the CTE step. A completely normalized design has other advantages, but won't necessarily be faster than an array with GIN index.
SQL Find most rows that match between two tables
We really could do with some expected output to help clarify the question.
If I understand you correctly however, this query will get you close to the results you require:
;with cte as
( SELECT t1a.[group] AS Group1
, t2a.[Group] AS Group2
, RANK() OVER(PARTITION BY t1a.[group]
ORDER BY COUNT(t2a.[Group]) DESC) AS MatchRank
FROM Table1 t1a
JOIN Table2 t2a
ON t1a.member = t2a.member
GROUP BY t1a.[group], t2a.[GRoup])
SELECT *
FROM cte
WHERE MatchRank=1
The query doesn't identify ties, but it will display any tied results...
If you are a newbie to common table expressions(the ;with statement) there is a useful description here.
Related Topics
Postgres 9.4 JSONb Array as Table
Creating a SQL Table from a Xls (Excel) File
Good Database and Structure to Store Synonyms
Cross Table Dependency/Constraint in SQL Database
What Does a Caret (^) Do in a SQL Query
Counting the Number of Occurrences of a Character in Oracle SQL
Insert Identity Column Value into Table from Another Table
Using Openxml in SQL Server 2008 Stored Proc - Insert Order Differs from Xml Document
Return Value from MySQL Stored Procedure
Running Powershell Scripts Through SQL
Rolling Sum Previous 3 Months SQL Server
Hive - How to Further Optimize a Hiveql Query
Logging SQL Statements of Entity Framework 5 for Database-First Aproach
Calculating How Many Days Are Between Two Dates in Db2
View or Function '' Is Not Updatable Because the Modification Affects Multiple Base Tables