Regex for Matching Something If It Is Not Preceded by Something Else

Regex for matching something if it is not preceded by something else

You want to use negative lookbehind like this:

\w*(?<!foo)bar

Where (?<!x) means "only if it doesn't have "x" before this point".

See Regular Expressions - Lookaround for more information.

Edit: added the \w* to capture the characters before (e.g. "beach").

Match string not preceded by another with a regular expression

All B's not preceded by a A by AB.

Find: (?<!A)B
Replace: AB

Find 'word' not followed by a certain character

The (?!@) negative look-ahead will make word match only if @ does not appear immediately after word:

word(?!@)

If you need to fail a match when a word is followed with a character/string somewhere to the right, you may use any of the three below

word(?!.*@)       # Note this will require @ to be on the same line as word
(?s)word(?!.*@) # (except Ruby, where you need (?m)): This will check for @ anywhere...
word(?![\s\S]*@) # ... after word even if it is on the next line(s)

See demo

This regex matches word substring and (?!@) makes sure there is no @ right after it, and if it is there, the word is not returned as a match (i.e. the match fails).

From Regular-expressions.info:

Negative lookahead is indispensable if you want to match something not followed by something else. When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead provides the solution: q(?!u). The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point.

And on Character classes page:

It is important to remember that a negated character class still must match a character. q[^u] does not mean: "a q not followed by a u". It means: "a q followed by a character that is not a u". It does not match the q in the string Iraq. It does match the q and the space after the q in Iraq is a country. Indeed: the space becomes part of the overall match, because it is the "character that is not a u" that is matched by the negated character class in the above regexp. If you want the regex to match the q, and only the q, in both strings, you need to use negative lookahead: q(?!u).

Match pattern not preceded by character

Use ([^^\w]|^)\w+
(see http://regexr.com/3e85b)

It basically injects a word boundary while excluding the ^ as well.

[^\w] = \W\b\w

Otherwise [^^] will match a '^T'

and \w+ will match est.

You can see it if you put capture groups around it.

Match if something is not preceded by something else

Unfortunately, there is no way to use a single pattern to match a string not preceded with some sequence in Lua (note that you can't even rely on capturing an alternative that you need since TEST%d+|(%d+) will not work in Lua, Lua patterns do not support alternation).

You may remove all substrings that start with TEST + digits after it, and then extract digit chunks:

local s = "TEST2XX_R_00.01.211_TEST"
for x in string.gmatch(s:gsub("TEST%d+",""), "%d+") do
print(x)
end

See the Lua demo

Here, s:gsub("TEST%d+","") will remove TEST<digits>+ and %d+ pattern used with string.gmatch will extract all digit chunks that remain.

Match pattern not preceded or followed by string

With your second attempt, that performs a logical AND, you are almost there. Just use | to separate the two possible scenarios:

(?<![A-Z]{2})(\d{9,})|(\d{9,})(?![A-Z]{2})

Regex match characters when not preceded by a string

Doing it with only one regex will be tricky - as stated in comments, there are lots of edge cases.

Myself I would do it with three steps:

  1. Replace spaces that should stay with some special character (re.sub)
  2. Split the text (re.split)
  3. Replace the special character with space

For example:

import re

zero_width_space = '\u200B'

s = 'I am from New York, N.Y. and I would like to say hello! How are you today? I am well. I owe you $6. 00 because you bought me a No. 3 burger. -Sgt. Smith'

s = re.sub(r'(?<=\.)\s+(?=[\da-z])|(?<=,)\s+|(?<=Sgt\.)\s+', zero_width_space, s)
s = re.split(r'(?<=[.?!])\s+', s)

from pprint import pprint
pprint([line.replace(zero_width_space, ' ') for line in s])

Prints:

['I am from New York, N.Y. and I would like to say hello!',
'How are you today?',
'I am well.',
'I owe you $6. 00 because you bought me a No. 3 burger.',
'-Sgt. Smith']

Regex until character but if not preceded by another character

You may use

\bLocalize\("([^"\\]*(?:\\.[^"\\]*)*)

See this regex demo.

Details:

  • \bLocalize - a whole word Localize
  • \(" - a (" substring
  • ([^"\\]*(?:\\.[^"\\]*)*) - Capturing group 1:

    • [^"\\]* - 0 or more chars other than " and \
    • (?:\\.[^"\\]*)* - 0 or more repetitions of an escaped char followed with 0 or more chars other than " and \

In Python, declare the pattern with

reg = r'\bLocalize\("([^"\\]*(?:\\.[^"\\]*)*)'

Demo:

import re
reg = r'\bLocalize\("([^"\\]*(?:\\.[^"\\]*)*)'
s = "Localize(\"/Windows/Actions/DeleteActionWarning=The action you are trying to \\\"delete\\\" is referenced in this document.\") + \" Want to Proceed ?\";"
m = re.search(reg, s)
if m:
print(m.group(1))
# => /Windows/Actions/DeleteActionWarning=The action you are trying to \"delete\" is referenced in this document.

Find string not preceded by other string

The current regex matches oo in foo because oo( is not preceded with "def ".

To stop the pattern from matching inside a word, you may use a a word boundary, \b and the fix might look like r"\b(?<!\bdef )([a-zA-Z0-9.]+?)\(".

Note that identifiers can be matched with [a-zA-Z_][a-zA-Z0-9_], so your pattern can be enhanced like

re.findall(r'\b(?<!\bdef\s)([a-zA-Z_]\w*(?:\.[a-zA-Z_]\w*)*)\(', s, re.A)

Note that re.A or re.ASCII will make \w match ASCII only letters, digits and _.

See the regex demo.

Details

  • \b - a word boundary
  • (?<!\bdef\s) - no def + space allowed immediately to the left of the current location
  • ([a-zA-Z_]\w*(?:\.[a-zA-Z_]\w*)*) - Capturing group 1 (its value will be the result of re.findall call):

    • [a-zA-Z_] - an ASCII letter or _
    • \w* - 1+ word chars
    • (?: - start of a non-capturing group matching a sequence of...

      • \. - a dot
      • [a-zA-Z_] - an ASCII letter or _
      • \w* - 1+ word chars
  • )* - ... zero or more times
  • \( - a ( char.


Related Topics



Leave a reply



Submit