Regex match entire words only
Use word boundaries:
/\b($word)\b/i
Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:
/(?:\W|^)(\Q$word\E)(?:\W|$)/i
regex match whole word and punctuation with it using re.search()
There are two issues here.
- In regex
.
is special. It means "match one of any character". However, you are trying to use it to match a regular period. (It will indeed match that, but it will also match everything else.) Instead, to match a period, you need to use the pattern\.
. And to change that to match either a period or a hyphen, you can use a class, like[-.]
. - You are using
\b
at the end of your pattern to match the word boundary, but\b
is defined as being the boundary between a word character and a non-word character, and periods and spaces are both non-word characters. This means that Python won't find a match. Instead, you could use a lookahead assertion, which will match whatever character you want, but won't consume the string.
Now, to match a whole word - any word - you can do something like \w+
, which matches one or more word characters.
Also, it is quite possible that there won't be a match anyway, so you should check whether a match occurred using an if
statement or a try
statement. Putting it all together:
txt = "The indian in. Spain."
pattern = r"\w+[-.]"
x = re.search(r"\b" + pattern + r"(?=\W)", txt)
if x:
print(x.start(), x.end())
Edit
There is one problem with the lookahead assertion above - it won't match the end of the string. This means that if your text is The rain in Spain.
then it won't match Spain.
, as there is no non-word character following the final period.
To fix this, you can use a negative lookahead assertion, which matches when the following text does not include the pattern, and also does not consume the string.
x = re.search(r"\b" + pattern + r"(?!\w)", txt)
This will match when the character after the word is anything other than a word character, including the end of the string.
Regex: find from a vast list of words, only whole words
You can use word anchors to match the start and end of words. (Assuming you are using something that supports PCRE.)
/\b(word1|word2|word3...)\b/
The \b
bit matches at a "word boundary". From Perl's regular expression man page (man perlre
)
A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a "\W".
Regex.Match whole words
You should add the word delimiter to your regex:
\b(shoes|shirt|pants)\b
In code:
Regex.Match(content, @"\b(shoes|shirt|pants)\b");
how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA
You can add word boundary \\b
to match only exact words:
rlike '(?i)\\bFECHADO\\b|\\bCIERRE\\b|\\bCLOSED\\b'
(?i)
means case insensitive, no need to use UPPER.
And the last alternative in your regex pattern is REVISTO. NORMAL.
If dots in it should be literally dots, use \\.
Like this: REVISTO\\. NORMAL\\.
Dot in regexp means any character and should be shielded with two backslashes to match dot literally.
Above regex works in Hive. Unfortunately I have no Impala to test it
How to match a whole word with a regular expression?
Try
re.search(r'\bis\b', your_string)
From the docs:
\b
Matches the empty string, but only at the beginning or end of a word.
Note that the re
module uses a naive definition of "word" as a "sequence of alphanumeric or underscore characters", where "alphanumeric" depends on locale or unicode options.
Also note that without the raw string prefix, \b
is seen as "backspace" instead of regex word boundary.
How to make regex match only whole words and not break the words down?
You could use {2,6}
and make sure to use word boundaries\b
so that there are not 2 matches, one for ABSTRA
and the other for CT
\b[A-Z]{2,6}(?:-[0-9]+)?\b
Regex demo
In python:
regex = r"\b[A-Z]{2,6}(?:-[0-9]+)?\b"
If in thispart -*[0-9]*
the hyphen is not optional you could turn it into an optional group (?:-[0-9]+)?
If there should not be anything on the left or right, you could use:
(?<!\S)[A-Z]{2,6}-?[0-9]*(?!\S)
Note that -*
will match 0 or more hyphens and -?
matches an optional one.
Regex demo
Related Topics
Java - Swing Listen an Action in a Text Field of a Form
Compare One String with Multiple Values in One Expression
How to Identify End of Inputstream in Java
Stack with Find-Min/Find-Max More Efficient Than O(N)
Find Place for Dedicated Application Folder
Java Floating Point High Precision Library
How to Create a Java 8 Localdate from a Long Epoch Time in Milliseconds
Obtaining the Array Class of a Component Type
Including Images with an Executable Jar
Class Not Found with Ant, Ivy and Junit - Error in Build.Xml
What Is the Purpose of Defining a Package in a Java File
How to Fix "Unsupported Class File Major Version 60" in Intellij Idea
Why Do == Comparisons with Integer.Valueof(String) Give Different Results for 127 and 128
Get Cell Value from Excel Sheet with Apache Poi