Regex Check If Specific Multiple Words Present in a Sentence

Regex check if specific multiple words present in a sentence

You can't alternate, because then a match for any of the alternations would fulfill the regex. Instead, use multiple lookaheads from the start of the string:

sentence1 = "hello i am from New York city"
sentence2 = "hello i am from New York"
regex = re.compile(r"^(?=.*hello)(?=.*from)(?=.*city)")
print(regex.match(sentence1))
print(regex.match(sentence2))

Output:

<_sre.SRE_Match object; span=(0, 0), match=''>
None

Multiple words in any order using regex

Use a capturing group if you want to extract the matches: (test)|(long)
Then depending on the language in use you can refer to the matched group using $1 and $2, for example.

How to check whether a pattern contain multiple words present in a string in java

You could try something like:

String string = " java is a programing language .java is robust and powerful and it is also platform independent. ";

String subS1 = "programing language";
subS1 = subS1.replace(" ", "\\s+");
Pattern p1 = Pattern.compile(subS1);
Matcher match1 = string.matcher(subS1);

String subS2 = "platform independent";
subS2 = subS2.replace(" ", "\\s+");
Pattern p2 = Pattern.compile(subS2);
Matcher match2 = string.matcher(subS2);

String subS3 = "robust and powerful";
subS3 = subS3.replace(" ", "\\s+");
Pattern p3 = Pattern.compile(subS3);
Matcher match3 = string.matcher(subS3);

if (match1.find() && match2.find() && match3.find()) {
// Whatever you like
}

You should replace all spaces in the substrings with '\s+', so it will also find "programing [loads of whitespaces] language".
Then compile the pattern you want to find and match the string and the substring. Repeat for each substring.
Lastly, test whether the matchers found anything.

Some Notes

  • Didn't test it, but it should give an idea of how I should do it.
  • Also, this should give an answer to you very specific question. This method should not be used when you have a large amount of substrings to check...
  • Please post some code you were already working with, because now I feel like making your homework...

Regex to match string containing two names in any order

You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info:

Lookahead and lookbehind, collectively called “lookaround”, are
zero-length assertions...lookaround actually matches characters, but
then gives up the match, returning only the result: match or no match.
That is why they are called “assertions”. They do not consume
characters in the string, but only assert whether a match is possible
or not.

It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.

So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack and james in either order:

^(?=.*\bjack\b)(?=.*\bjames\b).*$

Test it.

The expressions in parentheses starting with ?= are the positive lookaheads. I'll break down the pattern:

  1. ^ asserts the start of the expression to be matched.
  2. (?=.*\bjack\b) is the first positive lookahead saying that what follows must match .*\bjack\b.
  3. .* means any character zero or more times.
  4. \b means any word boundary (white space, start of expression, end of expression, etc.).
  5. jack is literally those four characters in a row (the same for james in the next positive lookahead).
  6. $ asserts the end of the expression to me matched.

So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james and another word boundary." After the two lookaheads is .* which simply matches any characters zero or more times and $ which matches the end of the expression.

"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack, but that is not necessary to satisfy the second lookahead) then the word james. Neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

I think you get the idea, but just to be absolutely clear, here is with jack and james reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james, but that is not necessary to satisfy the second lookahead) then the word jack. As before, neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

This approach has the advantage that you can easily specify multiple conditions.

^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$

Check if multiple words exist in a sentence

change $pattern = '/[$ch]/';

to

$pattern = '/('.$ch.')/'; or $pattern = '/['.$ch.']';

<?php 
$sentence = "This is a red apple";
$words = array('apple','red');
$ch = implode("|",$words);
echo $pattern = '/('.$ch.')/';

if(preg_match($pattern, $sentence))
{
echo ' Do something if the sentence contains red & apple';
}else
{
echo 'nothing happpen';
}
?>

check if both word match

<?php 
$sentence = "This is a red apple";
$words = array('red','apple');
$ch = implode("|",$words);
echo $pattern = '['.$ch.']';

if(preg_match_all($pattern, $sentence,$matches) == 2)
{
echo ' Do something if the sentence contains red & apple';
}else
{
echo 'nothing happpen';
}

?>

You also check how many word matched using

echo count($matches[0]);

$matches is array contain word are matched

How do I use regex to match on a string that does not contain one of multiple specific words?

You need to use \b around the words so they allow matching, ONLY if they are not present as whole words. Try using this,

^(?:(?!\b(sample|test)\b).)*$

Also, it is a good idea to make a group as non-capturing, unless you intend to use their value.

Regex Demo

Edit:

For making it case sensitive, enable the i flag by placing i just after / in regex. JS demo,

var arr = ['this is a test case','this is a testing area','this is a Test area']
arr.forEach(s => console.log(s + " --> " + /^(?:(?!\b(sample|test)\b).)*$/i.test(s)))

match multiple words in any order with regex

You could use Positive Lookahead to achieve this.

The lookahead approach is nice for matching strings that contain these substrings regardless of order.

if (preg_match('/(?=.*two)(?=.*four)(?=.*ten)/', $sentence)) {
echo $n." matched\n";
}

Code Demo

Checking the presence of multiple words in a variable using JavaScript

Here's a way to solve this. Simply loop through the list of words to check, build the regex as you go and check to see if there is a match. You can read up on how to build Regexp objects here

var str ="My best food is beans and plantain. Yam is also good but I prefer 
yam porrage"
var words = [
"food",
"beans",
"plantain",
"potato"
]

for (let word of words) {
let regex = new RegExp(`(^|\\W)${word}($|\\W)`)

if (str.match(regex)) {
console.log(`The matched word is ${word}`);
} else {
console.log('Word not found');
}
}

Python regular expression match multiple words anywhere

You've got a few problems there.

First, matches are case-sensitive unless you use the IGNORECASE/I flag to ignore case. So, 'AND' doesn't match 'and'.

Also, unless you use the VERBOSE/X flag, those spaces are part of the pattern. So, you're checking for 'AND ', not 'AND'. If you wanted that, you probably wanted spaces on each side, not just those sides (otherwise, 'band leader' is going to match…), and really, you probably wanted \b, not a space (otherwise a sentence starting with 'And another thing' isn't going to match).

Finally, if you think you need .* before and after your pattern and $ and ^ around it, there's a good chance you wanted to use search, findall, or finditer, rather than match.

So:

>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']

Regular expression visualization

Debuggex Demo

Check for multiple words in a string and show found/not found words

First, you need to define what a word is in this context. For example, is "pancake" one word or two? If you search for "pan" should it return true if "pancake" is found? Also, is "A.I." a word? Is "2020" a word? What about "fast-track"? Should the search be case-sensitive? Should it include partial finds? If you just want a simple, needle/haystack search like strpos then the solution is trivial.

$searchWords = ["yo","hi"];
$sentence = "yo you your";
$wordsFound = [];

foreach ($searchWords as $word) {
if (stripos($sentence, $word) !== false) {
$wordsFound[$word] = true;
}
}

echo "Words found: ", implode(",", array_keys($wordsFound)); // Words found: yo

To find the words that weren't found in $searchWords you'd just do $wordsNotFound = array_diff($searchWords, array_keys($wordsFound)).



Related Topics



Leave a reply



Submit