Regex.Match Whole Words

Regex match entire words only

Use word boundaries:

/\b($word)\b/i

Or if you're searching for "S.P.E.C.T.R.E." like in Sinan Ünür's example:

/(?:\W|^)(\Q$word\E)(?:\W|$)/i

regex match whole word and punctuation with it using re.search()

There are two issues here.

  1. In regex . is special. It means "match one of any character". However, you are trying to use it to match a regular period. (It will indeed match that, but it will also match everything else.) Instead, to match a period, you need to use the pattern \.. And to change that to match either a period or a hyphen, you can use a class, like [-.].
  2. You are using \b at the end of your pattern to match the word boundary, but \b is defined as being the boundary between a word character and a non-word character, and periods and spaces are both non-word characters. This means that Python won't find a match. Instead, you could use a lookahead assertion, which will match whatever character you want, but won't consume the string.

Now, to match a whole word - any word - you can do something like \w+, which matches one or more word characters.

Also, it is quite possible that there won't be a match anyway, so you should check whether a match occurred using an if statement or a try statement. Putting it all together:

txt = "The indian in. Spain."
pattern = r"\w+[-.]"
x = re.search(r"\b" + pattern + r"(?=\W)", txt)
if x:
print(x.start(), x.end())

Edit

There is one problem with the lookahead assertion above - it won't match the end of the string. This means that if your text is The rain in Spain. then it won't match Spain., as there is no non-word character following the final period.

To fix this, you can use a negative lookahead assertion, which matches when the following text does not include the pattern, and also does not consume the string.

x = re.search(r"\b" + pattern + r"(?!\w)", txt)

This will match when the character after the word is anything other than a word character, including the end of the string.

How to make regex match only whole words and not break the words down?

You could use {2,6} and make sure to use word boundaries\b so that there are not 2 matches, one for ABSTRA and the other for CT

\b[A-Z]{2,6}(?:-[0-9]+)?\b

Regex demo

In python:

regex = r"\b[A-Z]{2,6}(?:-[0-9]+)?\b"

If in thispart -*[0-9]* the hyphen is not optional you could turn it into an optional group (?:-[0-9]+)?

If there should not be anything on the left or right, you could use:

(?<!\S)[A-Z]{2,6}-?[0-9]*(?!\S)

Note that -* will match 0 or more hyphens and -? matches an optional one.

Regex demo

Regex.Match whole words

You should add the word delimiter to your regex:

\b(shoes|shirt|pants)\b

In code:

Regex.Match(content, @"\b(shoes|shirt|pants)\b");

Java Regex : match whole word with word boundary

It appears you only want to match "words" enclosed with whitespace (or at the start/end of strings).

Use

String pattern = "(?<!\\S)" + Pattern.quote(word) + "(?!\\S)";

The (?<!\S) negative lookbehind will fail all matches that are immediately preceded with a char other than a whitespace and (?!\s) is a negative lookahead that will fail all matches that are immediately followed with a char other than whitespace. Pattern.quote() is necessary to escape special chars that need to be treated as literal chars in the regex pattern.

How to match a whole word or sentence after a specific character with regexp

Here are 2 options depending on whether you want to include the colon in the pattern that you are capturing.

  • with the colon
    ^:\w*
  • with a lookback for the colon
    (?<=^:)\w*

    This will match a word after the colon.

    You may want any number of any character .* or any combination of word characters and spaces `[\w\s]*

Matching whole words that start or end with special characters

The \b word boundary construct is ambiguous. You need to use unambiguous constructs that will make sure there are non-word chars or start/end of string to the left/right of the word matched.

You may use

/(?:^|\W)\?FOO\?(?!\w)/g

Here, (?:^|\W) is a non-capturing group that matches either the start of a string or any non-word char, a char other than an ASCII letter, digit and _. (?!\w) is a negative lookahead that fails the match if, immediately to the right of the current location, there is a word char.

Or, with ECMAScript 2018 compatible JS environments,

/(?<!\w)\?FOO\?(?!\w)/g

See this regex demo.

The (?<!\w) is a negative lookbehind that fails the match if there is a word char immediately to the left of the current location.

In code, you may use it directly with String#match to extract all occurrences, like s.match(/(?<!\w)\?FOO\?(?!\w)/g).

The first expression needs a capturing group around the word you need to extract:





var strs = ["?FOO is cool", "I love ?FOO", "FOO is cool", "FOO?is cool", "aaFOO?is cool"];
var rx = /(?:^|\W)(\?FOO)(?!\w)/g;
for (var s of strs) {
var res = [], m;
while (m=rx.exec(s)) {
res.push(m[1]);
}
console.log(s, "=>", res);
}

RegEx Jquery. Match whole word or fragment and extract whole word?

Here's a possible solution for your problem.

const text = 'This is your sample message that is really messy.';
const pattern = /(\w*mes\w*)/gi;
const results = [...text.matchAll(pattern)];
console.log(results)

Here \w matches any alphanumeric symbol and an underscore, * matches that symbol 0 or more times (that means that you are waiting for an unknown number of symbols). Then mes is a substring that you are willing to find.
The g flag indicates that the regular expression should be tested against all possible matches in a string, and the i flag indicates that the matcher should ignore casing.

how to search for specific whole words within a string , via SQL, compatible with both HIVE/IMPALA

You can add word boundary \\b to match only exact words:

rlike '(?i)\\bFECHADO\\b|\\bCIERRE\\b|\\bCLOSED\\b'

(?i) means case insensitive, no need to use UPPER.

And the last alternative in your regex pattern is REVISTO. NORMAL.

If dots in it should be literally dots, use \\.

Like this: REVISTO\\. NORMAL\\.

Dot in regexp means any character and should be shielded with two backslashes to match dot literally.

Above regex works in Hive. Unfortunately I have no Impala to test it



Related Topics



Leave a reply



Submit