How to Use % Operator from the Extension Pg_Trgm

How to use % operator from the extension pg_trgm?

Most probably the problem is with the search_path setting. Run:

SHOW search_path;

Is the schema where you installed pg_trgm included? If not, include it.

Or, if you have the necessary privileges, you can change the schema of an extension with:

ALTER EXTENSION pg_trgm SET SCHEMA public;  -- or the schema you want

Alternatively, you can schema-qualify functions - and even operators using the OPERATOR() construct:

SELECT * FROM rssdata WHERE extensions.similarity(description, 'Brazil') > .8;
SELECT * FROM rssdata WHERE description OPERATOR(extensions.%) 'Brazil';

Removes dependency on the search_path. But it's tedious.

How to use pg_trgm operators(e.g. %) in django framework?

It looks like your problem is actually related to Python's string substitution. To construct the final query string, Django does something like:

self.sql % self.params

This uses the old style %-formatting, which wants to interpret your %> as a string substitution mechanism, similar to %s, but that's not a valid combination. To create a % character in the formatted string, you just have to use %% in your input string, like so:

In [1]: MyModel.objects.raw('SELECT * FROM myapp_mymodel WHERE myfield %%> %s', ['test'])
Out[1]: <RawQuerySet: SELECT * FROM myapp_mymodel WHERE myfield %> test>

...

By the way, Django has documentation on basic usage of pg_trgm without having to resort to raw query strings. Just make sure you have a migration that activates the extension first:

In addition to the trigram_similar lookup, you can use a couple of other expressions.

To use them, you need to activate the pg_trgm extension on PostgreSQL.
You can install it using the TrigramExtension migration operation.

Error while running fuzzy search using pg_trgm extension

Either you didn't create the extension pg_trgm in that database, or you created it in a schema that is not on the search_path active for the query.

You can also schema-qualify an operator:

WHERE col OPERATOR(schema_name.%) 'searchstring'

Quick Search with autocorrect (GIN INDEX and PG_TRGM extension)

First off, your table definition creates two unique indices on (email_address). Don't. Drop the UNIQUE constraint, keep the PK:

CREATE TABLE email (
email_address text PRIMARY KEY
, person_id uuid NOT NULL -- bigint?
);

(Also not sure why you would need uuid for person_id. There aren't nearly enough people in the world to justify more than a bigint.)

Next, since you want to ...

LIMIT the returned results to only the highly similar email addresses,

I suggest a nearest neighbor search. Create a GiST index for this purpose instead of the GIN:

CREATE INDEX email_address_trigram_gist_idx ON email USING gist (email_address gist_trgm_ops);

And use a query like this:

SELECT *, similarity('tesd100@gmail.com', email_address)
FROM email
WHERE email_address % 'tesd100@gmail.com'
ORDER BY email_address <-> 'tesd100@gmail.com' -- note the use of the operator <->
LIMIT 10;

Quoting the manual:

This can be implemented quite efficiently by GiST indexes, but not by
GIN indexes. It will usually beat the first formulation when only a
small number of the closest matches is wanted.

While working with a small LIMIT, there is probably no need to set pg_trgm.similarity_threshold very high because this query gives you the best matches first.

Related:

  • Search in 300 million addresses with pg_trgm
  • Finding similar strings with PostgreSQL quickly
  • PostgreSQL GIN index slower than GIST for pg_trgm?

How to overload the equality operator for primitive types (int, real...)?

Laurenz pointed out the immediate problem search_path. There is more.

Assuming Postgres 14, this would work:

CREATE OR REPLACE FUNCTION public.new_equality (real, real)  -- be explicit about the schema!
RETURNS bool
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE COST 10
BEGIN ATOMIC
SELECT abs($1 - $2) < 0.2;
END;

CREATE OR REPLACE FUNCTION public.new_inequality (real, real) -- be explicit about the schema!
RETURNS bool
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE COST 10
BEGIN ATOMIC
SELECT abs($1 - $2) >= 0.2;
END;

CREATE OPERATOR public.= ( -- be explicit about the schema!
LEFTARG = real
, RIGHTARG = real
, FUNCTION = public.new_equality
, COMMUTATOR = OPERATOR(public.=) -- be explicit about the schema!
, NEGATOR = OPERATOR(public.!=) -- must also exist
);

CREATE OPERATOR public.!= (
LEFTARG = real
, RIGHTARG = real
, FUNCTION = public.new_inequality
, COMMUTATOR = OPERATOR(public.!=)
, NEGATOR = OPERATOR(public.=)
);

Use the OPERATOR() construct to call it:

SELECT id, real '0.1' OPERATOR(public.=) real '0.2';
SELECT id, real '0.1' OPERATOR(public.!=) real '0.2';
SELECT id, real '0.1' OPERATOR(public.<>) real '0.2';

db<>fiddle here - with more examples

Be aware of the higher operator precedence, possibly forcing parentheses where the plain operator wouldn't!

You must also define the NEGATOR you mention in the declaration. Using the built-in != would be contradicting nonsense. Create a matching operator, which you must refer to with schema-qualified syntax. The manual:

To give a schema-qualified operator name in com_op or the other optional arguments, use the OPERATOR() syntax [...]

Related:

  • How to use % operator from the extension pg_trgm?
  • GIN index on smallint[] column not used or error "operator is not unique"

Note that <> is an automatic alias of !=, and <> is the default inequality operator in SQL.

An unqualified = will be the standard equality operator (OPERATOR(pg_catalog.=)) while you don't mess with the search_path to demote pg_catalog - which you shouldn't! Demoting pg_catalog opens the door to all kinds of serious problems, since system objects are now hidden behind one or more other schemas. Don't do that unless you know exactly what you are doing. About the search_path:

  • How does the search_path influence identifier resolution and the "current schema"

This assumes at least Postgres 14. About BEGIN ATOMIC:

  • How to implement `BEGIN ATOMIC` in PostgreSQL

Using the keyword FUNCTION instead of the misleading PROCEDURE, which is still valid for backward compatibility. See:

  • Trigger uses a procedure or a function?

Like a_horse_with_no_name suggested, it may be more convenient to use an operator symbol that is different from existing ones to avoid conflicts. Would still have standard (= higher) operator precedence than default comparison operators, and that cannot be changed easily.

Cannot run migrations when postgres extension added in rails 5 application

When enabling extensions its a good idea to create a separate migration as it makes it much easier to troubleshoot:

class EnableTrigramIndexLocationExtension < ActiveRecord::Migration[5.1]
def change
enable_extension "pg_trgm"
end
end

Loading an extension requires the same privileges that would be
required to create its component objects. For most extensions this
means superuser or database owner privileges are needed. The user who
runs CREATE EXTENSION becomes the owner of the extension for purposes
of later privilege checks, as well as the owner of any objects created
by the extension's script.

The simplest way to solve this is by using ALTER ROLE myapp SUPERUSER; which is not a very secure solution but works for development. For a production server you should instead use PostgreSQL Extension Whitelisting.

When generating migrations make sure to use either snakecase or camelcase to make the name readable rails g migration AddTrigramIndexLocationToUsers or rails g migration add_trigram_index_location_to_users.

class AddTrigramIndexLocationToUsers < ActiveRecord::Migration[5.1]
def up
execute %{
CREATE INDEX index_users_trigram_on_location ON users USING gin (location gin_trgm_ops);
}
end

def down
remove_index :users, :index_users_trigram_on_location
end
end


Related Topics



Leave a reply



Submit