Regex match entire words only
Use word boundaries:
/\b($word)\b/i
Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:
/(?:\W|^)(\Q$word\E)(?:\W|$)/i
regex match whole word and punctuation with it using re.search()
There are two issues here.
- In regex
.
is special. It means "match one of any character". However, you are trying to use it to match a regular period. (It will indeed match that, but it will also match everything else.) Instead, to match a period, you need to use the pattern\.
. And to change that to match either a period or a hyphen, you can use a class, like[-.]
. - You are using
\b
at the end of your pattern to match the word boundary, but\b
is defined as being the boundary between a word character and a non-word character, and periods and spaces are both non-word characters. This means that Python won't find a match. Instead, you could use a lookahead assertion, which will match whatever character you want, but won't consume the string.
Now, to match a whole word - any word - you can do something like \w+
, which matches one or more word characters.
Also, it is quite possible that there won't be a match anyway, so you should check whether a match occurred using an if
statement or a try
statement. Putting it all together:
txt = "The indian in. Spain."
pattern = r"\w+[-.]"
x = re.search(r"\b" + pattern + r"(?=\W)", txt)
if x:
print(x.start(), x.end())
Edit
There is one problem with the lookahead assertion above - it won't match the end of the string. This means that if your text is The rain in Spain.
then it won't match Spain.
, as there is no non-word character following the final period.
To fix this, you can use a negative lookahead assertion, which matches when the following text does not include the pattern, and also does not consume the string.
x = re.search(r"\b" + pattern + r"(?!\w)", txt)
This will match when the character after the word is anything other than a word character, including the end of the string.
How to make regex match only whole words and not break the words down?
You could use {2,6}
and make sure to use word boundaries\b
so that there are not 2 matches, one for ABSTRA
and the other for CT
\b[A-Z]{2,6}(?:-[0-9]+)?\b
Regex demo
In python:
regex = r"\b[A-Z]{2,6}(?:-[0-9]+)?\b"
If in thispart -*[0-9]*
the hyphen is not optional you could turn it into an optional group (?:-[0-9]+)?
If there should not be anything on the left or right, you could use:
(?<!\S)[A-Z]{2,6}-?[0-9]*(?!\S)
Note that -*
will match 0 or more hyphens and -?
matches an optional one.
Regex demo
Regex.Match whole words
You should add the word delimiter to your regex:
\b(shoes|shirt|pants)\b
In code:
Regex.Match(content, @"\b(shoes|shirt|pants)\b");
Java Regex : match whole word with word boundary
It appears you only want to match "words" enclosed with whitespace (or at the start/end of strings).
Use
String pattern = "(?<!\\S)" + Pattern.quote(word) + "(?!\\S)";
The (?<!\S)
negative lookbehind will fail all matches that are immediately preceded with a char other than a whitespace and (?!\s)
is a negative lookahead that will fail all matches that are immediately followed with a char other than whitespace. Pattern.quote()
is necessary to escape special chars that need to be treated as literal chars in the regex pattern.
How to match a whole word or sentence after a specific character with regexp
Here are 2 options depending on whether you want to include the colon in the pattern that you are capturing.
- with the colon
^:\w*
- with a lookback for the colon
(?<=^:)\w*
This will match a word after the colon.
You may want any number of any character.*
or any combination of word characters and spaces `[\w\s]*
Matching whole words that start or end with special characters
The \b
word boundary construct is ambiguous. You need to use unambiguous constructs that will make sure there are non-word chars or start/end of string to the left/right of the word matched.
You may use
/(?:^|\W)\?FOO\?(?!\w)/g
Here, (?:^|\W)
is a non-capturing group that matches either the start of a string or any non-word char, a char other than an ASCII letter, digit and _
. (?!\w)
is a negative lookahead that fails the match if, immediately to the right of the current location, there is a word char.
Or, with ECMAScript 2018 compatible JS environments,
/(?<!\w)\?FOO\?(?!\w)/g
See this regex demo.
The (?<!\w)
is a negative lookbehind that fails the match if there is a word char immediately to the left of the current location.
In code, you may use it directly with String#match
to extract all occurrences, like s.match(/(?<!\w)\?FOO\?(?!\w)/g)
.
The first expression needs a capturing group around the word you need to extract:
var strs = ["?FOO is cool", "I love ?FOO", "FOO is cool", "FOO?is cool", "aaFOO?is cool"];
var rx = /(?:^|\W)(\?FOO)(?!\w)/g;
for (var s of strs) {
var res = [], m;
while (m=rx.exec(s)) {
res.push(m[1]);
}
console.log(s, "=>", res);
}
RegEx Jquery. Match whole word or fragment and extract whole word?
Here's a possible solution for your problem.
const text = 'This is your sample message that is really messy.';
const pattern = /(\w*mes\w*)/gi;
const results = [...text.matchAll(pattern)];
console.log(results)
Here \w
matches any alphanumeric symbol and an underscore, *
matches that symbol 0 or more times (that means that you are waiting for an unknown number of symbols). Then mes
is a substring that you are willing to find.
The g
flag indicates that the regular expression should be tested against all possible matches in a string, and the i
flag indicates that the matcher should ignore casing.
how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA
You can add word boundary \\b
to match only exact words:
rlike '(?i)\\bFECHADO\\b|\\bCIERRE\\b|\\bCLOSED\\b'
(?i)
means case insensitive, no need to use UPPER.
And the last alternative in your regex pattern is REVISTO. NORMAL.
If dots in it should be literally dots, use \\.
Like this: REVISTO\\. NORMAL\\.
Dot in regexp means any character and should be shielded with two backslashes to match dot literally.
Above regex works in Hive. Unfortunately I have no Impala to test it
Related Topics
Reading CSV File and Storing Values into an Array
How to Convert Epoch Time in C#
Converting Numbers in to Words C#
Is "Else If" Faster Than "Switch() Case"
Is Polymorphic Deserialization Possible in System.Text.Json
Serialport Not Receiving Any Data
What Does "Use of Unassigned Local Variable" Mean
String Output: Format or Concat in C#
Correct Way to Load Assembly, Find Class and Call Run() Method
Where Did Imvcbuilder Addjsonoptions Go in .Net Core 3.0
How to Execute an .SQL Script File Using C#
Can My Enums Have Friendly Names
What Is the 'Dynamic' Type in C# 4.0 Used For
How to Get the Cpu Usage in C#