Search For "Whole Word Match" in MySQL

Search for whole word match in MySQL

You can use REGEXP and the [[:<:]] and [[:>:]] word-boundary markers:

SELECT *
FROM table
WHERE keywords REGEXP '[[:<:]]rid[[:>:]]'

Update for 2020: (actually 2018+)

MySQL updated its RegExp-Engine in version 8.0.4, so you will now need to use the "standard" word boundary marker \b:

SELECT *
FROM table
WHERE keywords REGEXP '\\brid\\b'

Also be aware that you need to escape the backslash by putting a second backslash.

Search for “whole word match” in MySQL

Instead of using REGEXP, you could also use the LIKE pattern matching operator.

A sample query could be:

SELECT * FROM `test` WHERE `text` LIKE '%Invitation.%';

Edit

Otherwise, if LIKE doesn't match your requirements, you can of course use REGEXP.

For a REGEXP (MySQL 5.7) expression, you'll want to use (mentioned by Wiktor):

SELECT * FROM `test` WHERE `text` REGEXP '[[:<:]]Invitation[.]';

For a REGEXP (MySQL 8.0) expression, you'll want to use:

SELECT * FROM `test` WHERE `text` REGEXP '\\bInvitation\\.';

The [[:<:]] & [[:>:]], and \b operators offer similar functionality for their boundaries. MySQL 5.7 is a little bit more explicit, as you can see per the documentation here at the bottom of the page. With MySQL 8.0, it supports the International Components for Unicode (ICU), as opposed to 5.7 that uses Henry Spencer's implementation for regular expressions.

From the MySQL 8.0 docs:

MySQL implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. (Prior to MySQL 8.0.4, MySQL used Henry Spencer's implementation of regular expressions, which operates in byte-wise fashion and is not multibyte safe.

If you do a search on this documentation page for \b, you'll see some clarification between the difference of ICU vs. Spencer regular expression handling:

The Spencer library supports word-beginning and word-end boundary markers ([[:<:]] and [[:>:]] notation). ICU does not. For ICU, you can use \b to match word boundaries; double the backslash because MySQL interprets it as the escape character within strings.

Bit of a learning experience for me too, thanks Wiktor!

MySQL Whole Word Match – Multiple words

You are getting different results because the two regexes are not identical.

(^| ) means : either the beginning of the string or a space (( |$) has the same meaning at end of string).

[[:<:]] and [[:>:]] are word boundaries : conceptually this refers to characters that separate words, and usually regex engines interpret it as something like : anything but a digit, a letter or an underscore.

So basically the first pattern is more restrictive than the second (space, beginning and end of string are word boundaries, but there are others).

If you have more than one keyword to search for, you would need to repeat the regex matches, like :

WHERE 
".$source." RLIKE '[[:<:]]".$keyword1."[[:>:]]'
OR ".$source." RLIKE '[[:<:]]".$keyword2."[[:>:]]'

Or create a new regex by combining the keywords :

WHERE 
".$source." RLIKE '[[:<:]](".$keyword1.")|(".$keyword2.")[[:>:]]'

NB : for search requirement, you should consider using MySQL Full Text Search, which are primarily built for the purpose of searching for full words (there are pre-requisites, though).

Whole word matching with + characters in MySQL

In most flavors of SQL, [[:<:]]s9+[[:>:]] would be written as \bs9+\b. There are two problems here. First, + is a regex metacharacter, and has a special meaning. It should be escaped with two backslashes. Also, \bs9\+\b would only match if a word character follows the +, as + itself is a non word character. Here is a version which should behave as you expect:

SELECT name FROM tbl_name WHERE name REGEXP '(^|\\s)s9\\+(\\s|$)';

This will match s9+ when it occurs with either whitespace or the start/end of the input on both sides.

MySQL - How to search for exact word match in a column of sentence?

You could use REGEXP:

SELECT *
FROM products
WHERE Sentence REGEXP '[[:<:]]possible[[:>:]]'
-- ^^^ ^^^ word boundaries

This would correspond to matching possible surrounded by a word boundary. Note in the demo that punctuation counts as a word boundary.

Also note that MySQL's REGEXP is case insensitive, so the above should match possible or Possible.

Demo here:

Rextester

MySQL - Search exact word from string

Try using regular expressions:

SELECT 
*
FROM
`table`
WHERE
Description regexp '(^|[[:space:]])pen([[:space:]]|$)';

Demo

Or using word boundaries:

SELECT 
*
FROM
`table`
WHERE
Description regexp '[[:<:]]pen[[:>:]]';

MySQL get full word match (not characters) at any position from sentence by relevancy

The solution to this is can be REGEXP q+
It matches any string containing at least one q.

Although it will match seo company with:

affordable seo company
atlanta seo company
austin seo company
automotive seo company
best atlanta seo company
best local seo company
best los angeles seo company
best organic seo company
best real estate seo company
best seo company
best seo company for google
best seo company for lawyers
best seo company for small business
best seo company in india
best seo company in the usa
best seo company in the world
best seo company india
best seo company los angeles
best seo company miami
boise seo company
boston seo company
california seo company
charleston seo company
charlotte seo company
cheap seo company

It can't match best seo company for small business with any of the above terms except itself.

Try this:

SELECT q 
FROM q
WHERE q REGEXP '$q+'
ORDER BY CASE WHEN q = '$q' THEN 0
WHEN q LIKE '$q%' THEN 1
WHEN q LIKE '%$q%' THEN 2
WHEN q LIKE '%$q' THEN 3
ELSE 4
END, q ASC
LIMIT 10

how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA

You can add word boundary \\b to match only exact words:

rlike '(?i)\\bFECHADO\\b|\\bCIERRE\\b|\\bCLOSED\\b'

(?i) means case insensitive, no need to use UPPER.

And the last alternative in your regex pattern is REVISTO. NORMAL.

If dots in it should be literally dots, use \\.

Like this: REVISTO\\. NORMAL\\.

Dot in regexp means any character and should be shielded with two backslashes to match dot literally.

Above regex works in Hive. Unfortunately I have no Impala to test it



Related Topics



Leave a reply



Submit