Postgresql Prefix Wildcard for Full Text

Postgresql prefix wildcard for full text

Full text search is good for finding words, not substrings.

For substring searches you'd better use like '%don%' with pg_trgm extension available from PostgreSQL 9.1 and using gin (column_name gin_trgm_ops) or using gist (column_name gist_trgm_ops) indexes. But your index would be very big (even several times bigger than your table) and write performance not very good.

There's a very good example of using pg_trgm for substring search on select * from depesz blog.

How to append prefix match to tsquery in PostgreSQL

You can make the last lexeme in a tsquery a prefix match by casting it to a string, appending ':*', then casting it back to a tsquery:

=> SELECT ((to_tsquery('foo <-> bar')::text || ':*')::tsquery);
tsquery
-------------------
'foo' <-> 'bar':*

For your usecase, you'll want to use <-> instead of & to require the words to be next to each other. Here's a demonstration of how they're different:

=> SELECT 'foo bar baz' @@ tsquery('foo & baz');
?column?
----------
t
(1 row)

=> SELECT 'foo bar baz' @@ tsquery('foo <-> baz');
?column?
----------
f
(1 row)

phraseto_tsquery makes it easy to have specify many words that have to be next to each other:

=> SELECT phraseto_tsquery('foo baz');
phraseto_tsquery
------------------
'foo' <-> 'baz'

Putting it all together:

=> SELECT (phraseto_tsquery('The fat ra')::text || ':*')::tsquery;
tsquery
------------------
'fat' <-> 'ra':*

Depending on your needs, a simpler way might be to build a tsquery directly with a string then a cast:

=> SELECT $$'fat' <-> 'ra':*$$::tsquery;
tsquery
------------------
'fat' <-> 'ra':*

Postgresql full text search part of words

Sounds like you simply want wildcard matching.

  • One option, as previously mentioned is trigrams. My (very) limited experience with it was that it was too slow on massive tables for my liking (some cases slower than a LIKE). As I said, my experience with trigrams is limited, so I might have just been using it wrong.

  • A second option you could use is the wildspeed module: http://www.sai.msu.su/~megera/wiki/wildspeed
    (you'll have to build & install this tho).

The 2nd option will work for suffix/middle matching as well. Which may or may not be more than you're looking for.

There are a couple of caveats (like size of the index), so read through that page thoroughly.

PostgreSQL: Full Text Search - How to search partial words?

Even using LIKE you will not be able to get 'squirrel' from squire% because 'squirrel' has two 'r's. To get Squire and Squirrel you could run the following query:

SELECT title FROM movies WHERE vectors @@ to_tsquery('squire|squirrel');

To differentiate between movies and tv shows you should add a column to your database. However, there are many ways to skin this cat. You could use a sub-query to force postgres to first find the movies matching 'squire' and 'squirrel' and then search that subset to find titles that begin with a '"'. It is possible to create indexes for use in LIKE '"%...' searches.

Without exploring other indexing possibilities you could also run these - mess around with them to find which is fastest:

SELECT title 
FROM (
SELECT *
FROM movies
WHERE vectors @@ to_tsquery('squire|squirrel')
) t
WHERE title ILIKE '"%';

or

SELECT title 
FROM movies
WHERE vectors @@ to_tsquery('squire|squirrel')
AND title ILIKE '"%';


Related Topics



Leave a reply



Submit