How to use % operator from the extension pg_trgm?
Most probably the problem is with the search_path
setting. Run:
SHOW search_path;
Is the schema where you installed pg_trgm
included? If not, include it.
Or, if you have the necessary privileges, you can change the schema of an extension with:
ALTER EXTENSION pg_trgm SET SCHEMA public; -- or the schema you want
Alternatively, you can schema-qualify functions - and even operators using the OPERATOR()
construct:
SELECT * FROM rssdata WHERE extensions.similarity(description, 'Brazil') > .8;
SELECT * FROM rssdata WHERE description OPERATOR(extensions.%) 'Brazil';
Removes dependency on the search_path
. But it's tedious.
How to use pg_trgm operators(e.g. %) in django framework?
It looks like your problem is actually related to Python's string substitution. To construct the final query string, Django does something like:
self.sql % self.params
This uses the old style %-formatting, which wants to interpret your %>
as a string substitution mechanism, similar to %s
, but that's not a valid combination. To create a %
character in the formatted string, you just have to use %%
in your input string, like so:
In [1]: MyModel.objects.raw('SELECT * FROM myapp_mymodel WHERE myfield %%> %s', ['test'])
Out[1]: <RawQuerySet: SELECT * FROM myapp_mymodel WHERE myfield %> test>
...
By the way, Django has documentation on basic usage of pg_trgm
without having to resort to raw query strings. Just make sure you have a migration that activates the extension first:
In addition to the
trigram_similar
lookup, you can use a couple of other expressions.To use them, you need to activate the
pg_trgm
extension on PostgreSQL.
You can install it using theTrigramExtension
migration operation.
Error while running fuzzy search using pg_trgm extension
Either you didn't create the extension pg_trgm
in that database, or you created it in a schema that is not on the search_path
active for the query.
You can also schema-qualify an operator:
WHERE col OPERATOR(schema_name.%) 'searchstring'
Quick Search with autocorrect (GIN INDEX and PG_TRGM extension)
First off, your table definition creates two unique indices on (email_address)
. Don't. Drop the UNIQUE
constraint, keep the PK:
CREATE TABLE email (
email_address text PRIMARY KEY
, person_id uuid NOT NULL -- bigint?
);
(Also not sure why you would need uuid
for person_id
. There aren't nearly enough people in the world to justify more than a bigint
.)
Next, since you want to ...
LIMIT the returned results to only the highly similar email addresses,
I suggest a nearest neighbor search. Create a GiST index for this purpose instead of the GIN:
CREATE INDEX email_address_trigram_gist_idx ON email USING gist (email_address gist_trgm_ops);
And use a query like this:
SELECT *, similarity('tesd100@gmail.com', email_address)
FROM email
WHERE email_address % 'tesd100@gmail.com'
ORDER BY email_address <-> 'tesd100@gmail.com' -- note the use of the operator <->
LIMIT 10;
Quoting the manual:
This can be implemented quite efficiently by GiST indexes, but not by
GIN indexes. It will usually beat the first formulation when only a
small number of the closest matches is wanted.
While working with a small LIMIT
, there is probably no need to set pg_trgm.similarity_threshold
very high because this query gives you the best matches first.
Related:
- Search in 300 million addresses with pg_trgm
- Finding similar strings with PostgreSQL quickly
- PostgreSQL GIN index slower than GIST for pg_trgm?
How to overload the equality operator for primitive types (int, real...)?
Laurenz pointed out the immediate problem search_path
. There is more.
Assuming Postgres 14, this would work:
CREATE OR REPLACE FUNCTION public.new_equality (real, real) -- be explicit about the schema!
RETURNS bool
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE COST 10
BEGIN ATOMIC
SELECT abs($1 - $2) < 0.2;
END;
CREATE OR REPLACE FUNCTION public.new_inequality (real, real) -- be explicit about the schema!
RETURNS bool
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE COST 10
BEGIN ATOMIC
SELECT abs($1 - $2) >= 0.2;
END;
CREATE OPERATOR public.= ( -- be explicit about the schema!
LEFTARG = real
, RIGHTARG = real
, FUNCTION = public.new_equality
, COMMUTATOR = OPERATOR(public.=) -- be explicit about the schema!
, NEGATOR = OPERATOR(public.!=) -- must also exist
);
CREATE OPERATOR public.!= (
LEFTARG = real
, RIGHTARG = real
, FUNCTION = public.new_inequality
, COMMUTATOR = OPERATOR(public.!=)
, NEGATOR = OPERATOR(public.=)
);
Use the OPERATOR()
construct to call it:
SELECT id, real '0.1' OPERATOR(public.=) real '0.2';
SELECT id, real '0.1' OPERATOR(public.!=) real '0.2';
SELECT id, real '0.1' OPERATOR(public.<>) real '0.2';
db<>fiddle here - with more examples
Be aware of the higher operator precedence, possibly forcing parentheses where the plain operator wouldn't!
You must also define the NEGATOR
you mention in the declaration. Using the built-in !=
would be contradicting nonsense. Create a matching operator, which you must refer to with schema-qualified syntax. The manual:
To give a schema-qualified operator name in
com_op
or the other optional arguments, use theOPERATOR()
syntax [...]
Related:
- How to use % operator from the extension pg_trgm?
- GIN index on smallint[] column not used or error "operator is not unique"
Note that <>
is an automatic alias of !=
, and <>
is the default inequality operator in SQL.
An unqualified =
will be the standard equality operator (OPERATOR(pg_catalog.=)
) while you don't mess with the search_path
to demote pg_catalog
- which you shouldn't! Demoting pg_catalog
opens the door to all kinds of serious problems, since system objects are now hidden behind one or more other schemas. Don't do that unless you know exactly what you are doing. About the search_path
:
- How does the search_path influence identifier resolution and the "current schema"
This assumes at least Postgres 14. About BEGIN ATOMIC
:
- How to implement `BEGIN ATOMIC` in PostgreSQL
Using the keyword FUNCTION
instead of the misleading PROCEDURE
, which is still valid for backward compatibility. See:
- Trigger uses a procedure or a function?
Like a_horse_with_no_name suggested, it may be more convenient to use an operator symbol that is different from existing ones to avoid conflicts. Would still have standard (= higher) operator precedence than default comparison operators, and that cannot be changed easily.
Cannot run migrations when postgres extension added in rails 5 application
When enabling extensions its a good idea to create a separate migration as it makes it much easier to troubleshoot:
class EnableTrigramIndexLocationExtension < ActiveRecord::Migration[5.1]
def change
enable_extension "pg_trgm"
end
end
Loading an extension requires the same privileges that would be
required to create its component objects. For most extensions this
means superuser or database owner privileges are needed. The user who
runs CREATE EXTENSION becomes the owner of the extension for purposes
of later privilege checks, as well as the owner of any objects created
by the extension's script.
The simplest way to solve this is by using ALTER ROLE myapp SUPERUSER;
which is not a very secure solution but works for development. For a production server you should instead use PostgreSQL Extension Whitelisting.
When generating migrations make sure to use either snakecase or camelcase to make the name readable rails g migration AddTrigramIndexLocationToUsers
or rails g migration add_trigram_index_location_to_users
.
class AddTrigramIndexLocationToUsers < ActiveRecord::Migration[5.1]
def up
execute %{
CREATE INDEX index_users_trigram_on_location ON users USING gin (location gin_trgm_ops);
}
end
def down
remove_index :users, :index_users_trigram_on_location
end
end
Related Topics
Oracle 12C - Select String After Last Occurrence of a Character
SQL - Safely Downcast Bigint to Int
SQL Varchar(Max) VS Varchar(Fix)
Parse a Date from Unformatted Text in SQL
SQL Server Query Xml Attribute for an Element Value
Doctrine Query - Ignoring Spaces
Performance and Readability of Regexp_Substr VS Instr and Substr
SQL Primary Key - Is It Necessary
Simple Update Statement So That All Rows Are Assigned a Different Value
Why My Table Doesnt Support Foreign Keys
SQL Identity with Leading Padded Zeros
What Happens with Duplicates When Inserting Multiple Rows
Finding Continuous Ranges in a Set of Numbers
How to Prevent SQL Injection in Wordpress
Get Every Hour for a Time Range
Ms Access: How to Count Distinct Value Using Access Query