Regex match entire words only
Use word boundaries:
/\b($word)\b/i
Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:
/(?:\W|^)(\Q$word\E)(?:\W|$)/i
Regular expression to match a word or its prefix
Square brackets are meant for character class, and you're actually trying to match any one of: s
, |
, s
(again), e
, a
, s
(again), o
and n
.
Use parentheses instead for grouping:
(s|season)
or non-capturing group:
(?:s|season)
Note: Non-capture groups tell the engine that it doesn't need to store the match, while the other one (capturing group does). For small stuff, either works, for 'heavy duty' stuff, you might want to see first if you need the match or not. If you don't, better use the non-capture group to allocate more memory for calculation instead of storing something you will never need to use.
How to match a whole word with a regular expression?
Try
re.search(r'\bis\b', your_string)
From the docs:
\b
Matches the empty string, but only at the beginning or end of a word.
Note that the re
module uses a naive definition of "word" as a "sequence of alphanumeric or underscore characters", where "alphanumeric" depends on locale or unicode options.
Also note that without the raw string prefix, \b
is seen as "backspace" instead of regex word boundary.
How to match a whole word or sentence after a specific character with regexp
Here are 2 options depending on whether you want to include the colon in the pattern that you are capturing.
- with the colon
^:\w*
- with a lookback for the colon
(?<=^:)\w*
This will match a word after the colon.
You may want any number of any character.*
or any combination of word characters and spaces `[\w\s]*
regex match whole word and punctuation with it using re.search()
There are two issues here.
- In regex
.
is special. It means "match one of any character". However, you are trying to use it to match a regular period. (It will indeed match that, but it will also match everything else.) Instead, to match a period, you need to use the pattern\.
. And to change that to match either a period or a hyphen, you can use a class, like[-.]
. - You are using
\b
at the end of your pattern to match the word boundary, but\b
is defined as being the boundary between a word character and a non-word character, and periods and spaces are both non-word characters. This means that Python won't find a match. Instead, you could use a lookahead assertion, which will match whatever character you want, but won't consume the string.
Now, to match a whole word - any word - you can do something like \w+
, which matches one or more word characters.
Also, it is quite possible that there won't be a match anyway, so you should check whether a match occurred using an if
statement or a try
statement. Putting it all together:
txt = "The indian in. Spain."
pattern = r"\w+[-.]"
x = re.search(r"\b" + pattern + r"(?=\W)", txt)
if x:
print(x.start(), x.end())
Edit
There is one problem with the lookahead assertion above - it won't match the end of the string. This means that if your text is The rain in Spain.
then it won't match Spain.
, as there is no non-word character following the final period.
To fix this, you can use a negative lookahead assertion, which matches when the following text does not include the pattern, and also does not consume the string.
x = re.search(r"\b" + pattern + r"(?!\w)", txt)
This will match when the character after the word is anything other than a word character, including the end of the string.
Regex match whole word including whitespace and fullstop
\b(Cat|Dog|Fish)\b
Use \b
or word boundary
.
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
Regex.Match whole words
You should add the word delimiter to your regex:
\b(shoes|shirt|pants)\b
In code:
Regex.Match(content, @"\b(shoes|shirt|pants)\b");
how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA
You can add word boundary \\b
to match only exact words:
rlike '(?i)\\bFECHADO\\b|\\bCIERRE\\b|\\bCLOSED\\b'
(?i)
means case insensitive, no need to use UPPER.
And the last alternative in your regex pattern is REVISTO. NORMAL.
If dots in it should be literally dots, use \\.
Like this: REVISTO\\. NORMAL\\.
Dot in regexp means any character and should be shielded with two backslashes to match dot literally.
Above regex works in Hive. Unfortunately I have no Impala to test it
Related Topics
How to Concatenate Items in a List to a Single String
Finding the Index of an Item in a List
How to Detect Collisions Between Two Rectangular Objects or Images in Pygame
How to Unload (Reload) a Python Module
Multiprocessing VS Threading Python
Pip' Is Not Recognized as an Internal or External Command
Use Different Python Version With Virtualenv
Why Are Multiple Instances of Tk Discouraged
Why Doesn't Calling a String Method Do Anything Unless Its Output Is Assigned
List Comprehension Without [ ] in Python
What Happens When Using Mutual or Circular (Cyclic) Imports in Python
How to Open a File With the Standard Application
Difference Between Python'S List Methods Append and Extend