Match String That Doesn't Contain a Specific Word

Regular expression to match a line that doesn't contain a word

The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:

^((?!hede).)*$

Non-capturing variant:

^(?:(?!:hede).)*$

The regex above will match any string, or line without a line break, not containing the (sub)string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible.

And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing s in the following pattern):

/^((?!hede).)*$/s

or use it inline:

/(?s)^((?!hede).)*$/

(where the /.../ are the regex delimiters, i.e., not part of the pattern)

If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class [\s\S]:

/^((?!hede)[\s\S])*$/

Explanation

A string is just a list of n characters. Before, and after each character, there's an empty string. So a list of n characters will have n+1 empty strings. Consider the string "ABhedeCD":

    ┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
└──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘

index 0 1 2 3 4 5 6 7

where the e's are the empty strings. The regex (?!hede). looks ahead to see if there's no substring "hede" to be seen, and if that is the case (so something else is seen), then the . (dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don't consume any characters. They only assert/validate something.

So, in my example, every empty string is first validated to see if there's no "hede" up ahead, before a character is consumed by the . (dot). The regex (?!hede). will do that only once, so it is wrapped in a group, and repeated zero or more times: ((?!hede).)*. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed: ^((?!hede).)*$

As you can see, the input "ABhedeCD" will fail because on e3, the regex (?!hede) fails (there is "hede" up ahead!).

match string if does not contain a specific word

You need the following regex

/^(?!.*TOTO)(.*)$/s

Try it at Regex 101 to get an explanation as to what it does

How to match a text in a string, that does not contain a specific word and does not contain a word with letters and digits?

Perhaps this will match your values using a word boundary and a negative lookahead:

\b(?!\w*abc)[^\W\d]+\b
  • \b Word boundary
  • (?!\w*abc) Assert what is on the right does not contain abc
  • [^\W\d]+ Negated character class, match 1+ times a word character except a digit
  • \b Word boundary

Regex demo

How to match all strings found unless it contains a specific word?


TL;DR Full Regex

http.{5,10}(?:media.tumblr)(?:(?!avatar).)+?(?:png|jpg|jpeg|gif|swf)

Why it fails

.+?(?!avatar).+?<anything else>

The first .+? matches one character (because it is lazy quantified).
If the string avatar is found next then it will also match the a of avatar
The second .+? matches everything else untill anything else can be matched.

A solution

Replace the part with

(?:(?!avatar).)+?<anything else>

Why it works

(?!avatar). matches a single character that is not the start of a string avatar.
The part (?:(?!avatar).)+? (lazily) matches all characters that fulfill this property. And if neither of the characters is the starting character of avatar then the string can not be contained.

Regex: Match word not containing

Your ^((?!Drive).)*$ did not work at all because you tested against a multiline input.

You should use /m modifier to see what the regex matches. It just matches lines that do not contain Drive, but that tempered greedy token does not check if EFI is inside the string.

Actually, the $ anchor is redundant here since .* matches any zero or more characters other than line break characters. You may simply remove it from your pattern.

(NOTE: In .NET, you will need to use [^\r\n]* instead of .* since . in a .NET pattern matches any char but a newline, LF, char, and matches all other line break chars, like a carriage return, CR, etc.).

Use something like

^(?!.*Drive).*EFI.*

Or, if you need to only fail the match if a Drive is present as a whole word:

^(?!.*\bDrive\b).*EFI.*

Or, if there are more words you want to signal the failure with:

^(?!.*(?:Drive|SomethingElse)).*EFI.*
^(?!.*\b(?:Drive|SomethingElse)\b).*EFI.*

See regex demo

Here,

  • ^ - matches start of string
  • (?!.*Drive) - makes sure there is no "Drive" in the string (so, Drives are NOT allowed)
  • (?!.*\bDrive\b) - makes sure there is no "Drive" as a whole word in the string (so, Drives are allowed)
  • .* - any 0+ chars other than line break chars, as many as possible
  • EFI - anEFI substring
  • .* - any 0+ chars other than line break chars, as many as possible.

If your string has newlines, either use a /s dotall modifier or replace . with [\s\S].

Regex: How to find substring that does NOT contain a certain word

Using a tempered dot, we can try:

string = "STARTcandyFINISH  STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
matches = re.findall(r'START((?:(?!poison).)*?)FINISH', string)
print(matches)

This prints:

['candy', 'sugar']

For an explanation of how the regex pattern works, we can have a closer look at:

(?:(?!poison).)*?

This uses a tempered dot trick. It will match, one character at a time, so long as what follows is not poison.

Regex to match a string which does not contain a specific word next to the match string


I want regex which does not contain not(in first string), I want to match only 2nd string.

That means you should check if the This is... pattern is not followed by newline sequence + spaces* + not as a whole word with backtracking disabled. We can disable backtracking using atomic group in .NET:

(?>This\s+is(?:\s+\d+)+ *)(?![\r\n]+\p{Zs}*not\b)

See the regex demo

Part 1 of the regex This\s+is(?:\s+\d+)+ * matches This is followed with one or more sequences of one or more whitespaces followed with one or more digits, then followed with zero or more spaces. The (?>...) prevent backtracking inside this part of the pattern. The lookahead (?![\r\n]+\p{Zs}*not\b) fails the match if the previously matched text is followed with the whitespaces followed with a whole word not (where \b stands for a word boundary).

How to match a range of string that doesn't contain an specific word using only regular expression?

This seems to work:





var regex = /\[(?!dog)([a-z]+) (?!dog)([a-z]+)\]/gi;

var string = "[cat dog] [dog cow] [cow cat] [cat tiger] [tiger lion] [monkey dog]";

console.log(string.match(regex));

Regular expression to match strings that do NOT contain all specified elements

Nice question. It looks like you are looking for some AND logic. I am sure someone can come up with something better, but I thought of two ways:

^(?=(?!.*\btwo\b)|(?!.*\bthree\b)).*$

See the online demo

Or:

^(?=.*\btwo\b)(?=.*\bthree\b)(*SKIP)(*F)|^.*$

See the online demo

In both cases we are using positive lookahead to mimic the AND logic to prevent both words being present in a text irrespective of their position in the full string. If just one of those words is present, the string will pass.

How to match a line not containing a word

This should work:

/^((?!PART).)*$/

Edit (by request): How this works

The (?!...) syntax is a negative lookahead, which I've always found tough to explain. Basically, it means "whatever follows this point must not match the regular expression /PART/." The site I've linked explains this far better than I can, but I'll try to break this down:

^         #Start matching from the beginning of the string.    
(?!PART) #This position must not be followed by the string "PART".
. #Matches any character except line breaks (it will include those in single-line mode).
$ #Match all the way until the end of the string.

The ((?!xxx).)* idiom is probably hardest to understand. As we saw, (?!PART) looks at the string ahead and says that whatever comes next can't match the subpattern /PART/. So what we're doing with ((?!xxx).)* is going through the string letter by letter and applying the rule to all of them. Each character can be anything, but if you take that character and the next few characters after it, you'd better not get the word PART.

The ^ and $ anchors are there to demand that the rule be applied to the entire string, from beginning to end. Without those anchors, any piece of the string that didn't begin with PART would be a match. Even PART itself would have matches in it, because (for example) the letter A isn't followed by the exact string PART.

Since we do have ^ and $, if PART were anywhere in the string, one of the characters would match (?=PART). and the overall match would fail. Hope that's clear enough to be helpful.



Related Topics



Leave a reply



Submit