Postgresql: Full Text Search - How to Search Partial Words

PostgreSQL: Full Text Search - How to search partial words?

Even using LIKE you will not be able to get 'squirrel' from squire% because 'squirrel' has two 'r's. To get Squire and Squirrel you could run the following query:

SELECT title FROM movies WHERE vectors @@ to_tsquery('squire|squirrel');

To differentiate between movies and tv shows you should add a column to your database. However, there are many ways to skin this cat. You could use a sub-query to force postgres to first find the movies matching 'squire' and 'squirrel' and then search that subset to find titles that begin with a '"'. It is possible to create indexes for use in LIKE '"%...' searches.

Without exploring other indexing possibilities you could also run these - mess around with them to find which is fastest:

SELECT title 
FROM (
SELECT *
FROM movies
WHERE vectors @@ to_tsquery('squire|squirrel')
) t
WHERE title ILIKE '"%';

or

SELECT title 
FROM movies
WHERE vectors @@ to_tsquery('squire|squirrel')
AND title ILIKE '"%';

Postgresql full text search part of words

Sounds like you simply want wildcard matching.

  • One option, as previously mentioned is trigrams. My (very) limited experience with it was that it was too slow on massive tables for my liking (some cases slower than a LIKE). As I said, my experience with trigrams is limited, so I might have just been using it wrong.

  • A second option you could use is the wildspeed module: http://www.sai.msu.su/~megera/wiki/wildspeed
    (you'll have to build & install this tho).

The 2nd option will work for suffix/middle matching as well. Which may or may not be more than you're looking for.

There are a couple of caveats (like size of the index), so read through that page thoroughly.

Use Postgresql full text search to fuzzy match all search terms

Ideally if the user would type a small error like:
ofice bmt
The results should still appear.

This could be very hard to do on more than a best-effort basis. If someone enters "Canter", how should the system know if they meant a shortening of Canterburry, or a misspelling of "cancer", or of "cantor", or if they really meant a horse's gait? Perhaps you can create a dictionary of common typos for your specific field? Also, without the specific knowledge that time zones are expected and common, "bmt" seems like a misspelling of, well, something.

This works fine, however in some cases it doesn't and I believe that's because it interprets it like English and ignores some words like of, off, in, etc...

Don't just believe, check and see!

select to_tsquery('english','OFF:*&BMT:*');
to_tsquery
------------
'bmt':*

Yes indeed, to_tsquery does omit stop words, even with the :* thingy.

One option is to use 'simple' rather than 'english' as your configuration:

select to_tsquery('simple','OFF:*&BMT:*');
to_tsquery
-------------------
'off':* & 'bmt':*

Another option is to write tsquery directly rather than processing through to_tsquery. Note that in this case, you have to lower-case it yourself:

select 'off:*&bmt:*'::tsquery;
tsquery
-------------------
'off':* & 'bmt':*

Also note that if you do this with 'office:*', you will never get a match in an 'english' configuration, because 'office' in the document gets stemmed to 'offic', while no stemming occurs when you write 'office:*'::tsquery. So you could use 'simple' rather than 'english' to avoid both stemming and stop words. Or you could test each word in the query individually to see if it gets stemmed before deciding to add :* to it.

Is there a way to avoid this but still have good performance and fuzzy search? I'm not keen on having to sync my PG with ElasticSearch for this.

What do you mean by fuzzysearch? You don't seem to be using that now. You are just using prefix matching, and accidentally using stemming and stopwords. How large is your table to be searched, and what kind of performance is acceptable?

If did you use ElasticSearch, how would you then phrase your searches? If you explained how you would phrase the search in ES, maybe someone can help you do the same thing in PostgreSQL. I don't think we can take it as a given that switching to ES will just magically do the right thing.

I could do it by building a list of AND statements in the WHERE clause
with LIKE '% ... %' but that would probably hurt performance and
doesn't support fuzzysearch.

Have you looked into pg_trgm? It can make those types of queries quite fast. Also, LIKE '%...%' is lot more fuzzy than what you are currently doing, so I don't understand how you will lose that. pg_trgm also provides the '<->' operator which is even fuzzier, and might be your best bet. It can deal with typos fairly well when embedded in long strings, but in short strings they can really be a problem.

Fulltext search with partial strings on Postgresql

Trigram is a contrib module for Postgres, which can help you achieve your goal. There is a complete example of its usage in the docs.

Beginning in 9.1, trigram support index searches for LIKE and ILIKE operators.

Beginning in 9.3, it support index searches for regular-expression matches (~ and ~* operators).

But if you want to search for any order of the provided partial words, you should query for each word separate:

...
WHERE LOWER(text) LIKE '%lue%'
OR LOWER(text) LIKE '%ped%'
OR LOWER(text) LIKE '%zeb%'

postgresql tsvector partial text match

It looks like you want to use fast ILIKE queries for wild match. pg_trgm will be the right tool to go with. You can use POSIX regex rules for defining your query.

WITH data(t) AS ( VALUES
('test123! TT7 89'::TEXT),
('test123, TT7 89'::TEXT),
('test@test123.domain TT7 89'::TEXT)
)
SELECT count(*) FROM data WHERE t ~* 'es' AND t ~* '\mtest123\M';

Result:

 count 
-------
3
(1 row)

Links for existing answers:

  • Postgresql full text search part of words
  • PostgreSQL: Full Text Search - How to search partial words?

Full Text Search Using Multiple Partial Words

So it was a weird syntax issue that didn't cause an error, but stopped the search from working.

I changed it to

SELECT * FROM Icd10Codes where CONTAINS(description, '"hyper*" NEAR "hea*"')

The key here being I needed double quotes " and not to single quotes. I assumed it was two single quotes, the first to escape the second, but it was actually double quotes. The above query returns the results exactly as expected.



Related Topics



Leave a reply



Submit