SQL Server Freetext match - how do I sort by relevance
If you are using FREETEXTTABLE
then it returns a column name Rank
, so order by Rank
should work. I don't know if other freetext search methods are also returning this value or not. You can have a try.
FREETEXT search - ordering results according to how close they match
Use FREETEXTTABLE instead of FREETEXT.
FREETEXTTABLE will return a table of keys with Rank information. You can sort on this Rank information to find the items that are the closest matches.
Microsoft FREETEXTTABLE documentation
The following example shows how this works:
SELECT
t.TableID
, t.TextData
, ft.Rank
FROM
Table t
INNER JOIN FREETEXTTABLE ( Table , * , 'car park' ) ft ON ( t.TableID = ft.[Key] )
ORDER BY
ft.Rank DESC
SQL SELECT FREETEXT order by Rank
You should use FREETEXTTABLE
(link) instead of FREETEXT
:
SELECT TOP 1000 Q.*, QI.*
FROM Quotes Q
INNER JOIN QuoteImages QI
ON Q.Id = QI.QuoteId
INNER JOIN FREETEXTTABLE(Quotes,QuoteText,'some text') FT
ON Q.Id = FT.[Key]
ORDER BY RANK DESC
How to sort the results from full text index search based on relevance
You can select the match as part of your query, and it comes back as a "score". You can then sort on that. IE:
SELECT
author,
MATCH(author,title,series) AGAINST('Anna Selby' in BOOLEAN MODE) as score
FROM
search
WHERE
MATCH(author,title,series) AGAINST('Anna Selby' in BOOLEAN MODE)
ORDER BY
score
UPDATE: Here is a weighted example
SELECT
author,
(
(MATCH(author) AGAINST('Anna Selby' in BOOLEAN MODE) * 20) +
(MATCH(title) AGAINST('Anna Selby' in BOOLEAN MODE) * 10) +
(MATCH(series) AGAINST('Anna Selby' in BOOLEAN MODE) * 5)) as score
FROM
search
WHERE
MATCH(author,title,series) AGAINST('Anna Selby' in BOOLEAN MODE)
ORDER BY
score
Search with relevance ranking using containstable and freetext
Does this article help?
MSDN : Limiting Ranked Result Sets (Full-Text Search)
It implies, in part, that using an additional parameter will allow you to limit the result to the ones with the greatest relevance (which you can influence using WEIGHT
) and also order by that relevance (RANK
).
top_n_by_rank is an integer value, n, that specifies that only the n
highest ranked matches are to be returned, in descending order.
The doc doesn't have an example for FREETEXT
; it only references CONTAINSTABLE
. But it definitely implies that CONTAINSTABLE
outputs a RANK
column that you could use to ORDER BY
.
I don't know if there is any way to enforce your own definition of relevance. It may make sense to pull out the top 10 relevant matches according to FTS, then apply your own ranking on the output, e.g. you can split up the search terms using a function, and order by how many of the words matched. For simplicity and easy repro in the following example I am not using Full-Text in the subquery but you can replace it with whatever you're actually doing. First create the function:
IF OBJECT_ID('dbo.SplitStrings') IS NOT NULL
DROP FUNCTION dbo.SplitStrings;
GO
CREATE FUNCTION dbo.SplitStrings(@List NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN ( SELECT Item FROM
( SELECT Item = x.i.value('(./text())[1]', 'nvarchar(max)')
FROM ( SELECT [XML] = CONVERT(XML, '<i>'
+ REPLACE(@List, ' ', '</i><i>') + '</i>').query('.')
) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y
WHERE Item IS NOT NULL
);
GO
Then a simple script that shows how to perform the matching:
DECLARE @foo TABLE
(
id INT,
[description] NVARCHAR(450)
);
INSERT @foo VALUES
(1,N'McDonalds fast food'),
(2,N'healthy food'),
(3,N'fast food restaurant'),
(4,N'Italian restaurant'),
(5,N'Spike''s Junkyard Dogs');
DECLARE @searchstring NVARCHAR(255) = N'fast food restaurant';
SELECT x.id, x.[description]--, MatchCount = COUNT(s.Item)
FROM
(
SELECT f.id, f.[description]
FROM @foo AS f
-- pretend this actually does full-text search:
--where (FREETEXT(description,@strsearch))
-- and ignore how I actually matched:
INNER JOIN dbo.SplitStrings(@searchstring) AS s
ON CHARINDEX(s.Item, f.[description]) > 0
GROUP BY f.id, f.[description]
) AS x
INNER JOIN dbo.SplitStrings(@searchstring) AS s
ON CHARINDEX(s.Item, x.[description]) > 0
GROUP BY x.id, x.[description]
ORDER BY COUNT(s.Item) DESC, [description];
Results:
id description
-- -----------
3 fast food restaurant
1 McDonalds fast food
2 healthy food
4 Italian restaurant
MySQL fulltext search and sort by relevance + TIME
You could change to an aggregate score... something like this:
SELECT *,
(
MATCH(title, content) AGAINST('search string')
-
(ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
) AS score
FROM news_items
WHERE
MATCH(title, content) AGAINST('search string') > 4
ORDER BY score DESC LIMIT 4
In that there's one kinda funky addition, which you'd want to clean up:
- (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
This is your age
component of the score... currently scaled by <year> = 1 point
To get that, we start by getting the number of days between the timestamp field and now (absolute value):
ABS(DATEDIFF(`timestampfield`, NOW()))
Then we scale...
I decided you probably didn't want to loose score based on number of days, because if something was 30 days old it would be -30... seems too harsh. So I chose years... if you want to scale on number of weeks, divide by 52
instead of 365
... and so forth.
This scaling factor will be how you control value between scoring matching and age.
So it ends up being something like: <match score> - <yearsAgo>
If you do that:
- 5 (match score) - 0.1 (<1 year ago) = 4.9 (ok match, but newest)
- 5 (match score) - 0.01 (<1 year ago) = 4.99
- 5 (match score) - 1 (1 year ago) = 4
- 6 (match score) - 2 (2 years ago) = 4
- 9 (match score) - 5 (5 years ago) = 4 (best match, but old)
- 7 (match score) - 10 (10 years ago) = -3
NOTE this assumes your timestamp field is a full date-time field... if otherwise, you will need to re-cast to a date, or the logic to manipulate the unix timestamp directly.
And here's a debugging version of the query:
SELECT
`created`,
MATCH(title, content) AGAINST('awesome') as match_score,
(ABS(DATEDIFF(`created`, NOW())) / 365) as years_ago,
(
MATCH(title, content) AGAINST('awesome')
-
(ABS(DATEDIFF(`created`, NOW())) / 365)
) AS score
FROM news_items
WHERE
MATCH(title, content) AGAINST('awesome') > 4
ORDER BY score DESC LIMIT 4
Database search engine - Sort by relevance according to specific relevance rules
There're CONTAINSTABLE and FREETEXTTABLE functions - they return RANK column that is is "relevance ranking". Probably these functions plus some complex ordering by non-text columns will do the job.
If you decide to implement FTS in your app, have a look at third-party solutions. Lucene (or Lucene.NET) is probably good to start with.
How to display relevant record first using FULLTEXT in SQL?
You have no Order By clause, which frees the database to return records in any order. Usually it is the order in which it encounters them as it processes the where clause, so it is easy to believe there is a "natural" order that the database uses. If you want a specific order you either need to add an Order By clause that will use some scoring method you create or you must order them in the program that receives the records.
(I would have to guess that the other records you are pulling also contain the search criteria, just farther into the text field.)
See here for an example of using Rank to order your records: SQL Server Freetext match - how do I sort by relevance The Rank column is generated by the text matching call.
How can I return the best matched row first in sort order from a set returned by querying a single search term against multiple columns in Postgres?
Use greatest()
:
greatest(similarity('12345', foo_text), similarity('12345', bar_text), similarity('12345', foobar_text)) desc
Related Topics
Replacing Certain Character in Email Addresses with '*' in an SQL Query
Have Pl/Sql Outputs in Real Time
Execute Procedure in a Trigger
Use Soundex() Word by Word on SQL Server
Whats The Best Sqlite Data Type for a Long String
What Is The Purpose (Or Use Case) for an Outer Join in Sql
Recommended Method to Import a .Csv File into Microsoft SQL Server 2008 R2
Issue of Multiple SQL Notifications in ASP.NET Web Application on Page Refresh
How to Use Sum for Bit Columns
Change Data Type Varchar to Varbinary(Max) in SQL Server
Why Doesn't Oracle Allow Consecutive Newline Characters in Commands
Is Too Many Left Joins a Code Smell
Sql - Select Max() and Accompanying Field
Sql - Create Database and Tables in One Script
Unique Date Range Fields in SQL Server 2008