What to Do Regular Expression Pattern Doesn't Match Anywhere in String

Regular expression to match a line that doesn't contain a word

The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:

^((?!hede).)*$

Non-capturing variant:

^(?:(?!:hede).)*$

The regex above will match any string, or line without a line break, not containing the (sub)string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible.

And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing s in the following pattern):

/^((?!hede).)*$/s

or use it inline:

/(?s)^((?!hede).)*$/

(where the /.../ are the regex delimiters, i.e., not part of the pattern)

If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class [\s\S]:

/^((?!hede)[\s\S])*$/

Explanation

A string is just a list of n characters. Before, and after each character, there's an empty string. So a list of n characters will have n+1 empty strings. Consider the string "ABhedeCD":

    ┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
└──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘

index 0 1 2 3 4 5 6 7

where the e's are the empty strings. The regex (?!hede). looks ahead to see if there's no substring "hede" to be seen, and if that is the case (so something else is seen), then the . (dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don't consume any characters. They only assert/validate something.

So, in my example, every empty string is first validated to see if there's no "hede" up ahead, before a character is consumed by the . (dot). The regex (?!hede). will do that only once, so it is wrapped in a group, and repeated zero or more times: ((?!hede).)*. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed: ^((?!hede).)*$

As you can see, the input "ABhedeCD" will fail because on e3, the regex (?!hede) fails (there is "hede" up ahead!).

What to do Regular expression pattern doesn't match anywhere in string?

Contrary to all the answers here, for what you're trying to do regex is a perfectly valid solution. This is because you are NOT trying to match balanced tags-- THAT would be impossible with regex! But you are only matching what's in one tag, and that's perfectly regular.

Here's the problem, though. You can't do it with just one regex... you need to do one match to capture an <input> tag, then do further processing on that. Note that this will only work if none of the attribute values have a > character in them, so it's not perfect, but it should suffice for sane inputs.

Here's some Perl (pseudo)code to show you what I mean:

my $html = readLargeInputFile();

my @input_tags = $html =~ m/
(
<input # Starts with "<input"
(?=[^>]*?type="hidden") # Use lookahead to make sure that type="hidden"
[^>]+ # Grab the rest of the tag...
\/> # ...except for the />, which is grabbed here
)/xgm;

# Now each member of @input_tags is something like <input type="hidden" name="SaveRequired" value="False" />

foreach my $input_tag (@input_tags)
{
my $hash_ref = {};
# Now extract each of the fields one at a time.

($hash_ref->{"name"}) = $input_tag =~ /name="([^"]*)"/;
($hash_ref->{"value"}) = $input_tag =~ /value="([^"]*)"/;

# Put $hash_ref in a list or something, or otherwise process it
}

The basic principle here is, don't try to do too much with one regular expression. As you noticed, regular expressions enforce a certain amount of order. So what you need to do instead is to first match the CONTEXT of what you're trying to extract, then do submatching on the data you want.

EDIT: However, I will agree that in general, using an HTML parser is probably easier and better and you really should consider redesigning your code or re-examining your objectives. :-) But I had to post this answer as a counter to the knee-jerk reaction that parsing any subset of HTML is impossible: HTML and XML are both irregular when you consider the entire specification, but the specification of a tag is decently regular, certainly within the power of PCRE.

Match pattern anywhere in string?

Remove the ^ and $ to search anywhere in the string.

In your case the * are probably not what you intended; E[0-9]{4}49 should suffice. This will find an E, followed by four digits, followed by a 4 and a 9, anywhere in the string.

RegExp matching string not starting with my

You could either use a lookahead assertion like others have suggested. Or, if you just want to use basic regular expression syntax:

^(.?$|[^m].+|m[^y].*)

This matches strings that are either zero or one characters long (^.?$) and thus can not be my. Or strings with two or more characters where when the first character is not an m any more characters may follow (^[^m].+); or if the first character is a m it must not be followed by a y (^m[^y]).

Match pattern not preceded or followed by string

With your second attempt, that performs a logical AND, you are almost there. Just use | to separate the two possible scenarios:

(?<![A-Z]{2})(\d{9,})|(\d{9,})(?![A-Z]{2})

Regex doesn't find a match with a pattern

The problem is:

[A-Ba-b0-9/-]+

What character ranges (x-y) basically do is get a set of all characters in between. In other words, a-b = all letters between a and b, aka only a and b. However,

d27956cca6b75db4d8dd502d0569dd246455131c

looks like a hex. Therefore, you should use

[A-Fa-f0-9-]+

instead.

Regex for matching something if it is not preceded by something else

You want to use negative lookbehind like this:

\w*(?<!foo)bar

Where (?<!x) means "only if it doesn't have "x" before this point".

See Regular Expressions - Lookaround for more information.

Edit: added the \w* to capture the characters before (e.g. "beach").

Regex match exactly 1 anywhere in string

The (?=[@]){1}[a-zA-Z@]+$ pattern matches any substring that starts with @ and then has zero or more letters or @ up to the end of the string. Look at what it matches.

You need to use

^(?=[^@]*@[^@]*$)(?=[^.]*\.)[a-zA-Z@.]+$

Or, if there must be also one dot (and no more than one) in the string

^(?=[^@]*@[^@]*$)(?=[^.]*\.[^.]*$)[a-zA-Z@.]+$

See the regex demo #1 and the regex demo #2.

Details

  • ^ - start of string
  • (?=[^@]*@[^@]*$) - requires only one @ and no more than one in string - a positive lookahead that requires 0+ chars other than @, a @, and again zero or more chars other than @ till the end of string
  • (?=[^.]*\.) - requires at least one dot - a positive lookahead that requires 0+ chars other than . and then a .
  • (?=[^.]*\.[^.]*$) - requires only one dot and no more than one in string - a positive lookahead that requires 0+ chars other than ., a ., and again zero or more chars other than . till the end of string
  • [a-zA-Z@.]+ - one or more ASCII letters, @ or .
  • $ - end of string.

Trying to match string A if string B is found anywhere before it

Using 2 positive lookarounds, you can assert what is on the left is an opening square bracket (?<=\[)

Then match any char except ] using a negated character class ![^[\]]+ preceded by an exclamation mark and assert what is on the right is a closing square bracket using (?=])

Note that in Javascript the lookbehind is not yet widely supported.

(?<=\[)![^[\]]+(?=])

In the replacement use the matched substring $&

Regex demo