Regex to Match 2 Different Words in a String

Regular expression to find two strings anywhere in input

/^.*?\bcat\b.*?\bmat\b.*?$/m

Using the m modifier (which ensures the beginning/end metacharacters match on line breaks rather than at the very beginning and end of the string):

  • ^ matches the line beginning
  • .*? matches anything on the line before...
  • \b matches a word boundary the first occurrence of a word boundary (as @codaddict discussed)
  • then the string cat and another word boundary; note that underscores are treated as "word" characters, so _cat_ would not match*;
  • .*?: any characters before...
  • boundary, mat, boundary
  • .*?: any remaining characters before...
  • $: the end of the line.

It's important to use \b to ensure the specified words aren't part of longer words, and it's important to use non-greedy wildcards (.*?) versus greedy (.*) because the latter would fail on strings like "There is a cat on top of the mat which is under the cat." (It would match the last occurrence of "cat" rather than the first.)

* If you want to be able to match _cat_, you can use:

/^.*?(?:\b|_)cat(?:\b|_).*?(?:\b|_)mat(?:\b|_).*?$/m

which matches either underscores or word boundaries around the specified words. (?:) indicates a non-capturing group, which can help with performance or avoid conflicted captures.

Edit: A question was raised in the comments about whether the solution would work for phrases rather than just words. The answer is, absolutely yes. The following would match "A line which includes both the first phrase and the second phrase":

/^.*?(?:\b|_)first phrase here(?:\b|_).*?(?:\b|_)second phrase here(?:\b|_).*?$/m

Edit 2: If order doesn't matter you can use:

/^.*?(?:\b|_)(first(?:\b|_).*?(?:\b|_)second|second(?:\b|_).*?(?:\b|_)first)(?:\b|_).*?$/m

And if performance is really an issue here, it's possible lookaround (if your regex engine supports it) might (but probably won't) perform better than the above, but I'll leave both the arguably more complex lookaround version and performance testing as an exercise to the questioner/reader.

Edited per @Alan Moore's comment. I didn't have a chance to test it, but I'll take your word for it.

Regex match two words in a string

This will match exactly the words "Mac" and "ExchangeWebServices" with anything else between them:

\bMac\b.*\bExchangeWebServices\b

Regex 101 Example: https://regex101.com/r/sK2qG1/4

Regex to match string containing two names in any order

You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info:

Lookahead and lookbehind, collectively called “lookaround”, are
zero-length assertions...lookaround actually matches characters, but
then gives up the match, returning only the result: match or no match.
That is why they are called “assertions”. They do not consume
characters in the string, but only assert whether a match is possible
or not.

It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.

So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack and james in either order:

^(?=.*\bjack\b)(?=.*\bjames\b).*$

Test it.

The expressions in parentheses starting with ?= are the positive lookaheads. I'll break down the pattern:

  1. ^ asserts the start of the expression to be matched.
  2. (?=.*\bjack\b) is the first positive lookahead saying that what follows must match .*\bjack\b.
  3. .* means any character zero or more times.
  4. \b means any word boundary (white space, start of expression, end of expression, etc.).
  5. jack is literally those four characters in a row (the same for james in the next positive lookahead).
  6. $ asserts the end of the expression to me matched.

So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james and another word boundary." After the two lookaheads is .* which simply matches any characters zero or more times and $ which matches the end of the expression.

"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack, but that is not necessary to satisfy the second lookahead) then the word james. Neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

I think you get the idea, but just to be absolutely clear, here is with jack and james reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james, but that is not necessary to satisfy the second lookahead) then the word jack. As before, neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

This approach has the advantage that you can easily specify multiple conditions.

^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$

Match a string between two or more words regardless of order

You may use a backreference + a subroutine:

\b(longword1|longword2)\b.*?\b(?!\1\b)(?1)\b

Expanding it for three alternatives:

\b(longword1|longword2|longword3)\b.*?\b(?!\1\b)((?1))\b.*?\b(?!(?:\1|\2)\b)(?1)\b

See the regex demo and this regex demo, too. So, the list of words will be in Group 1, and you will only need to add backreferences before the subsequent subroutines.

Details

  • \b(longword1|longword2)\b - a whole word, either longword1 or longword2
  • .*? - any 0 or more chars other than line break chars, as few as possible
  • \b - a word boundary
  • (?!\1\b) - there cannot be the same text as matched in Group 1 followed with a word boundary
  • (?1) - a subroutine that matches the same pattern as in Group 1
  • \b - a word boundary

Regex match two strings with given number of words in between strings

You can use something like

import re
text = 'I want apples and oranges'
k = 2
pattern = f"apples(?:\s+\w+){{0,{k}}}\s+oranges"
m = re.search(pattern, text)
if m:
print(m.group())

# => apples and oranges

Here, I used \w+ to match a word. If the word is a non-whitespace chunk, you need to use

pattern = f"apples(?:\s+\S+){{0,{k}}}\s+oranges"

See this Python demo.

If you need to add word boundaries, you need to study the Word boundary with words starting or ending with special characters gives unexpected results and Match a whole word in a string using dynamic regex posts. For the current example, fr"\bapples(?:\s+\w+){{0,{k}}}\s+oranges\b" will work.

The pattern will look like apples(?:\s+\w+){0,k}\s+oranges and match

  • apples - an apples string
  • (?:\s+\w+){0,k} - zero to k repetitions of one or more whitespaces and one or more word chars
  • \s+ - one or more whitespaces
  • oranges an oranges string.

regex to match 2 different words in a string

You can use /pops?/ if you want to match partially.

const obj = {time_pop: 'fhfvla',icon: 'dsfval',home_pops: 'valffg',title: 'sdfsdfs',pop: 'sfsdfsd',rattle: 'sdfdsf',pops: 'sfsdfsdf'}
const only = Object.entries(obj).filter(([k, v]) => { return /pops?/g.test(k)})

console.log(only)

Regex match multiple words that may be separated by another word giving a list of possible intermediate words

You may create a regex like

/\bmake(?:\s+(?:of|the|a))*\s+wish\b/gi

See the regex demo. Details

  • \b - a word boundary
  • make - a word
  • (?:\s+(?:of|the|a))* - 0 or more occurrences of

    • \s+ - 1+ whitespaces
    • (?:of|the|a) - either of, the or a (you might want to use an? to also match an)
  • \s+ - 1+ whitespaces
  • wish - a word wish
  • \b - a word boundary

In your code, you may use

let stopword: string[]= ["of", "the", "a"];let to_match : string = "make wish";let text: string = "make wish wish make a wish wish wish make the a wish make";const regex = new RegExp(`\\b${to_match.split(/\s+/).join("(?:\\s+(?:" + stopword.join("|") + "))*\\s+")}\\b`, "gi"); console.log(text.match(regex));

PHP- Regex to match words with more than two letters

The reason it is not working is because the pattern [^\w{2,}]*([\s]+([^\w{2,}])*|$) matches only spaces, and then you split on those spaces resulting in an array with all the words. This is due to \s which matches a whitespace char, and using the negated character class [^\w{2,}] which also matches whitespace chars.

If you want to use split, you also have to match the single word characters so that they are not part of the result.


If you must use split, you can match either a single word character surrounded by optional horizontal whitespace characters to remove those as well, or match 1+ horizontal whitespace characters.

\h*\b\w\b\h*|\h+

Regex demo

For example

$input_string = "I have a cake inside my fridge";
$string_array = preg_split("/\h*\b\w\b\h*|\h+/", $input_string, -1, PREG_SPLIT_NO_EMPTY);
print_r($string_array);

Output

Array
(
[0] => have
[1] => cake
[2] => inside
[3] => my
[4] => fridge
)

If you want to match all strings that consist of at least 2 characters, you could also use \S{2,} with preg_match_all.



Related Topics



Leave a reply



Submit