How to Use T-SQL Full-Text Search to Get Results Like Google

How do you use T-SQL Full-Text Search to get results like Google?

I found another question on here that deals with this same topic. In fact, the post detailing the method is even titled "A Google-like Full Text Search". It uses an open-source library called Irony to parse a user-entered search string and turn it into a FTS-compatible query.

Here is the source code for the latest version of the Google-like Full-Text Search.

SQL full text search vs LIKE

Full text search is likely to be quicker since it will benefit from an index of words that it will use to look up the records, whereas using LIKE is going to need to full table scan.

In some cases LIKE will more accurate since LIKE "%The%" AND LIKE "%Matrix" will pick out "The Matrix" but not "Matrix Reloaded" whereas full text search will ignore "The" and return both. That said both would likely have been a better result.

sql full text search CONTAINS() *sometimes* yield no result

The answer to your question is pretty straightforward: get is the system defined stop word, therefore it gets ignored during searches.

You can examine words in the the system defined stop list like this:

select * from sys.fulltext_system_stopwords where language_id=1033

And you can turn off the stop list for your index like this:

ALTER FULLTEXT INDEX ON customer SET STOPLIST = OFF

Run the above query and try running your searches for get again and see if any results are returned.

More info on stop words and stop lists can be found here: Configure and Manage Stopwords and Stoplists for Full-Text Search .

SQL Server Full Text Search around numbers and underscores

They changed the full text parsers/stemmers between SQL 2008 and SQL 2012.

With a registry change, you can use the legacy parser, which should work better in your situation.

See https://technet.microsoft.com/en-us/library/gg509108(v=sql.110).aspx for details.

If you need to support both old and new style, then you can revert US English to the old and keep UK English the new (or vice versa)

Using SQL 2016, I reverted UK English and kept US English the same:

exec sp_help_fulltext_system_components 'wordbreaker', 1033

exec sp_help_fulltext_system_components 'wordbreaker', 2057

Returns:
Screenshot showing fts components

I created another table using UK English and populated it.

CREATE TABLE TestFullTextSearch2 (Id INT NOT NULL, AllText NVARCHAR(400))

CREATE UNIQUE INDEX test_tfts2 ON TestFullTextSearch2(Id)

CREATE FULLTEXT INDEX ON TestFullTextSearch2(AllText language 2057)
KEY INDEX test_tfts2 ON ftcat_tfts
WITH CHANGE_TRACKING AUTO, STOPLIST OFF

INSERT INTO TestFullTextSearch2
VALUES (1, ' 123_456 789 '), (2, ' 789 123_456 '),
(3, ' 123_456 ABC '), (4, ' ABC 123_456 ')

I'm getting the expected 4 results for all 3 queries.

Results of FTS queries

Verify that your changes have taken effect.

exec sp_help_fulltext_system_components 'wordbreaker', 1033

exec sp_help_fulltext_system_components 'wordbreaker', 2057

select t.name, c.* from sys.tables t inner join sys.fulltext_index_columns c on t.object_id = c.object_id

T-SQL stored procedure to return google style suggested search results

I'm going to suggest full text search (MS' or Lucene will work) The code below use MSSQL FTS as its what I use in my app at the moment.

Install FTS Search if you haven't already. If you have check the service is running.
In management studio run this to setup a catalog and add the products table; and Color / Name / Product Number to the catalog.

USE [AdventureWorks]
GO
CREATE FULLTEXT CATALOG [ProductsTest]WITH ACCENT_SENSITIVITY = OFF
AUTHORIZATION [dbo]

GO

USE [AdventureWorks]
GO
CREATE FULLTEXT INDEX ON [Production].[Product] KEY INDEX [PK_Product_ProductID] ON ([ProductsTest]) WITH (CHANGE_TRACKING AUTO)
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Color])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Name])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([ProductNumber])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ENABLE
GO

You can then run queries against all columns at once; e.g. Silver (Chosen as its in color and Name)

Select * from production.product where
contains(*, '"Silver*"')

The * on the query will find Silver* so you can use this to build up results as the user types in. One thing to consider is that google make this work in real time - if you are searching a lot of data you to be able to get the data back without interrupting the typing of the user. i think generally people use these searches by typing from the first letter they are looking for - i accept there will be spelling mistakes- you could implement a spell checker after every space they press perhaps to handle that. Or store the searches that are run and look at the mispellings and change the code to handle that based on a mapping (or in FTS using a custom thesaurus.)

Ranking is going to be a fun development issue to any business; are you finding the first result for Mountain Frame -or do you want to weight them by sales or price? If the user types in more than one text term you can use FTS to produce a ranking based on the search string.

select aa.rank, bb.* 
From containstable(production.product, *, '"Mountain" and "Silver*"') aa
inner join production.product bb
on aa.[key] = bb.productid
order by rank desc

This returns 30 rows; and weights based on the user inputted text to determine the first place record. In either case you will likely want to add a coded ranking to tweak the results to suit your business desires - ranking te highest priced widget 1 might not be the way. That is why you are going to store what people searched for / clicked on so you can analyse the results later.

There is a really nice language parser for .Net that translates a google style string query inputted into FTS'able language which gives familiarity for any boolean searches that use your site.

You may also want to add some wisdom of crowds features by auditing against what users have input and ultimately gone to visit and use success maps to alter the final suggestions to actually make them relevant to the user.

As a final suggestion if this is a commercial website you might want to look at Easyask which is a scary great natural language processor



Related Topics



Leave a reply



Submit