Sql Server Freetext Match - How to Sort by Relevance

SQL Server Freetext match - how do I sort by relevance

If you are using FREETEXTTABLE then it returns a column name Rank, so order by Rank should work. I don't know if other freetext search methods are also returning this value or not. You can have a try.

FREETEXT search - ordering results according to how close they match

Use FREETEXTTABLE instead of FREETEXT.

FREETEXTTABLE will return a table of keys with Rank information. You can sort on this Rank information to find the items that are the closest matches.

Microsoft FREETEXTTABLE documentation

The following example shows how this works:

SELECT
t.TableID
, t.TextData
, ft.Rank
FROM
Table t
INNER JOIN FREETEXTTABLE ( Table , * , 'car park' ) ft ON ( t.TableID = ft.[Key] )
ORDER BY
ft.Rank DESC

SQL SELECT FREETEXT order by Rank

You should use FREETEXTTABLE (link) instead of FREETEXT:

SELECT TOP 1000 Q.*, QI.*
FROM Quotes Q
INNER JOIN QuoteImages QI
ON Q.Id = QI.QuoteId
INNER JOIN FREETEXTTABLE(Quotes,QuoteText,'some text') FT
ON Q.Id = FT.[Key]
ORDER BY RANK DESC

How to sort the results from full text index search based on relevance

You can select the match as part of your query, and it comes back as a "score". You can then sort on that. IE:

SELECT 
author,
MATCH(author,title,series) AGAINST('Anna Selby' in BOOLEAN MODE) as score
FROM
search
WHERE
MATCH(author,title,series) AGAINST('Anna Selby' in BOOLEAN MODE)
ORDER BY
score

UPDATE: Here is a weighted example

SELECT 
author,
(
(MATCH(author) AGAINST('Anna Selby' in BOOLEAN MODE) * 20) +
(MATCH(title) AGAINST('Anna Selby' in BOOLEAN MODE) * 10) +
(MATCH(series) AGAINST('Anna Selby' in BOOLEAN MODE) * 5)) as score
FROM
search
WHERE
MATCH(author,title,series) AGAINST('Anna Selby' in BOOLEAN MODE)
ORDER BY
score

Search with relevance ranking using containstable and freetext

Does this article help?

MSDN : Limiting Ranked Result Sets (Full-Text Search)

It implies, in part, that using an additional parameter will allow you to limit the result to the ones with the greatest relevance (which you can influence using WEIGHT) and also order by that relevance (RANK).

top_n_by_rank is an integer value, n, that specifies that only the n
highest ranked matches are to be returned, in descending order.

The doc doesn't have an example for FREETEXT; it only references CONTAINSTABLE. But it definitely implies that CONTAINSTABLE outputs a RANK column that you could use to ORDER BY.

I don't know if there is any way to enforce your own definition of relevance. It may make sense to pull out the top 10 relevant matches according to FTS, then apply your own ranking on the output, e.g. you can split up the search terms using a function, and order by how many of the words matched. For simplicity and easy repro in the following example I am not using Full-Text in the subquery but you can replace it with whatever you're actually doing. First create the function:

IF OBJECT_ID('dbo.SplitStrings') IS NOT NULL
DROP FUNCTION dbo.SplitStrings;
GO
CREATE FUNCTION dbo.SplitStrings(@List NVARCHAR(MAX))
RETURNS TABLE
AS
RETURN ( SELECT Item FROM
( SELECT Item = x.i.value('(./text())[1]', 'nvarchar(max)')
FROM ( SELECT [XML] = CONVERT(XML, '<i>'
+ REPLACE(@List, ' ', '</i><i>') + '</i>').query('.')
) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y
WHERE Item IS NOT NULL
);
GO

Then a simple script that shows how to perform the matching:

DECLARE @foo TABLE
(
id INT,
[description] NVARCHAR(450)
);

INSERT @foo VALUES
(1,N'McDonalds fast food'),
(2,N'healthy food'),
(3,N'fast food restaurant'),
(4,N'Italian restaurant'),
(5,N'Spike''s Junkyard Dogs');

DECLARE @searchstring NVARCHAR(255) = N'fast food restaurant';

SELECT x.id, x.[description]--, MatchCount = COUNT(s.Item)
FROM
(
SELECT f.id, f.[description]
FROM @foo AS f

-- pretend this actually does full-text search:
--where (FREETEXT(description,@strsearch))

-- and ignore how I actually matched:
INNER JOIN dbo.SplitStrings(@searchstring) AS s
ON CHARINDEX(s.Item, f.[description]) > 0

GROUP BY f.id, f.[description]
) AS x
INNER JOIN dbo.SplitStrings(@searchstring) AS s
ON CHARINDEX(s.Item, x.[description]) > 0
GROUP BY x.id, x.[description]
ORDER BY COUNT(s.Item) DESC, [description];

Results:

id description
-- -----------
3 fast food restaurant
1 McDonalds fast food
2 healthy food
4 Italian restaurant

MySQL fulltext search and sort by relevance + TIME

You could change to an aggregate score... something like this:


SELECT *, 
(
MATCH(title, content) AGAINST('search string')
-
(ABS(DATEDIFF(`timestampfield`, NOW())) / 365)
) AS score
FROM news_items
WHERE
MATCH(title, content) AGAINST('search string') > 4
ORDER BY score DESC LIMIT 4

In that there's one kinda funky addition, which you'd want to clean up:

- (ABS(DATEDIFF(`timestampfield`, NOW())) / 365)

This is your age component of the score... currently scaled by <year> = 1 point

To get that, we start by getting the number of days between the timestamp field and now (absolute value):

ABS(DATEDIFF(`timestampfield`, NOW()))

Then we scale...

I decided you probably didn't want to loose score based on number of days, because if something was 30 days old it would be -30... seems too harsh. So I chose years... if you want to scale on number of weeks, divide by 52 instead of 365... and so forth.

This scaling factor will be how you control value between scoring matching and age.

So it ends up being something like: <match score> - <yearsAgo>


If you do that:

  1. 5 (match score) - 0.1 (<1 year ago) = 4.9 (ok match, but newest)
  2. 5 (match score) - 0.01 (<1 year ago) = 4.99
  3. 5 (match score) - 1 (1 year ago) = 4
  4. 6 (match score) - 2 (2 years ago) = 4
  5. 9 (match score) - 5 (5 years ago) = 4 (best match, but old)
  6. 7 (match score) - 10 (10 years ago) = -3

NOTE this assumes your timestamp field is a full date-time field... if otherwise, you will need to re-cast to a date, or the logic to manipulate the unix timestamp directly.

And here's a debugging version of the query:

SELECT
`created`,
MATCH(title, content) AGAINST('awesome') as match_score,
(ABS(DATEDIFF(`created`, NOW())) / 365) as years_ago,
(
MATCH(title, content) AGAINST('awesome')
-
(ABS(DATEDIFF(`created`, NOW())) / 365)
) AS score
FROM news_items
WHERE
MATCH(title, content) AGAINST('awesome') > 4
ORDER BY score DESC LIMIT 4

Database search engine - Sort by relevance according to specific relevance rules

There're CONTAINSTABLE and FREETEXTTABLE functions - they return RANK column that is is "relevance ranking". Probably these functions plus some complex ordering by non-text columns will do the job.

If you decide to implement FTS in your app, have a look at third-party solutions. Lucene (or Lucene.NET) is probably good to start with.

How to display relevant record first using FULLTEXT in SQL?

You have no Order By clause, which frees the database to return records in any order. Usually it is the order in which it encounters them as it processes the where clause, so it is easy to believe there is a "natural" order that the database uses. If you want a specific order you either need to add an Order By clause that will use some scoring method you create or you must order them in the program that receives the records.

(I would have to guess that the other records you are pulling also contain the search criteria, just farther into the text field.)

See here for an example of using Rank to order your records: SQL Server Freetext match - how do I sort by relevance The Rank column is generated by the text matching call.

How can I return the best matched row first in sort order from a set returned by querying a single search term against multiple columns in Postgres?

Use greatest():

greatest(similarity('12345', foo_text), similarity('12345', bar_text), similarity('12345', foobar_text)) desc


Related Topics



Leave a reply



Submit