How to Validate Regex

How can I validate regex?

// This is valid, both opening ( and closing )
var_dump(preg_match('~Valid(Regular)Expression~', '') === false);
// This is invalid, no opening ( for the closing )
var_dump(preg_match('~InvalidRegular)Expression~', '') === false);

As the user pozs said, also consider putting @ in front of preg_match() (@preg_match()) in a testing environment to prevent warnings or notices.

To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false (=== false), it's broken. Otherwise it's valid though it need not match anything.

So there's no need to write your own RegExp validator. It's wasted time...

Is there a regular expression to detect a valid regular expression?

/
^ # start of string
( # first group start
(?:
(?:[^?+*{}()[\]\\|]+ # literals and ^, $
| \\. # escaped characters
| \[ (?: \^?\\. | \^[^\\] | [^\\^] ) # character classes
(?: [^\]\\]+ | \\. )* \]
| \( (?:\?[:=!]|\?<[=!]|\?>)? (?1)?? \) # parenthesis, with recursive content
| \(\? (?:R|[+-]?\d+) \) # recursive matching
)
(?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )? # quantifiers
| \| # alternative
)* # repeat content
) # end first group
$ # end of string
/

This is a recursive regex, and is not supported by many regex engines. PCRE based ones should support it.

Without whitespace and comments:

/^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/

.NET does not support recursion directly. (The (?1) and (?R) constructs.) The recursion would have to be converted to counting balanced groups:

^                                         # start of string
(?:
(?: [^?+*{}()[\]\\|]+ # literals and ^, $
| \\. # escaped characters
| \[ (?: \^?\\. | \^[^\\] | [^\\^] ) # character classes
(?: [^\]\\]+ | \\. )* \]
| \( (?:\?[:=!]
| \?<[=!]
| \?>
| \?<[^\W\d]\w*>
| \?'[^\W\d]\w*'
)? # opening of group
(?<N>) # increment counter
| \) # closing of group
(?<-N>) # decrement counter
)
(?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )? # quantifiers
| \| # alternative
)* # repeat content
$ # end of string
(?(N)(?!)) # fail if counter is non-zero.

Compacted:

^(?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>|\?<[^\W\d]\w*>|\?'[^\W\d]\w*')?(?<N>)|\)(?<-N>))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*$(?(N)(?!))

From the comments:

Will this validate substitutions and translations?

It will validate just the regex part of substitutions and translations. s/<this part>/.../

It is not theoretically possible to match all valid regex grammars with a regex.

It is possible if the regex engine supports recursion, such as PCRE, but that can't really be called regular expressions any more.

Indeed, a "recursive regular expression" is not a regular expression. But this an often-accepted extension to regex engines... Ironically, this extended regex doesn't match extended regexes.

"In theory, theory and practice are the same. In practice, they're not." Almost everyone who knows regular expressions knows that regular expressions does not support recursion. But PCRE and most other implementations support much more than basic regular expressions.

using this with shell script in the grep command , it shows me some error.. grep: Invalid content of {} . I am making a script that could grep a code base to find all the files that contain regular expressions

This pattern exploits an extension called recursive regular expressions. This is not supported by the POSIX flavor of regex. You could try with the -P switch, to enable the PCRE regex flavor.

Regex itself "is not a regular language and hence cannot be parsed by regular expression..."

This is true for classical regular expressions. Some modern implementations allow recursion, which makes it into a Context Free language, although it is somewhat verbose for this task.

I see where you're matching []()/\. and other special regex characters. Where are you allowing non-special characters? It seems like this will match ^(?:[\.]+)$, but not ^abcdefg$. That's a valid regex.

[^?+*{}()[\]\\|] will match any single character, not part of any of the other constructs. This includes both literal (a - z), and certain special characters (^, $, .).

How to validate RegExp with an if else statement in react?

Instead of comparing the equality of string with the regex object, you need to use test method, which returns a boolean value based on the passed string matching pattern or not

textChange = () => {
if (/[^a-zA-Z\s]/.test(this.state.text) ) {
alert("No symbols allowed")
}
}

Regex to validate custom format

You want

xx:xx:xx or if it is followed by a -, then it must be a 0 or 1 and then it is the end (word boundry).

So you don't want any of these

0a:0b:0c-123
0a:0b:0cd
10a:0b:0c

either.

Then you want "negative lookingahead", so if you match the first part, you don't want it to be followed by a - (the first pattern) and it should end there (word boundary), and if it is followed by a -, then it must be a 0 or 1, and then a word boundary:

/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}(?!-)\b|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i

To prevent any digit in front, a word boundary is added to the front as well.

Example: https://regexr.com/4rg42

The following almost worked:

/\b([0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}\b[^-]|\b[0-9a-f]{2}[:][0-9a-f]{2}[:][0-9a-f]{2}-[01]\b)/i

but if it is the end of file and it is 3a:2b:11, then the [^-] will try to match a non - character and it won't match.

Example: https://regexr.com/4rg4q

Regular expression to validate a name

You can use

^[A-Z](?=.{1,29}$)[A-Za-z]*(?:\h+[A-Z][A-Za-z]*)*$

The pattern matches:

  • ^ Start of string
  • [A-Z] Match an uppercase char A-Z
  • (?=.{1,29}$) Assert 1-29 chars to the right till the end of the string
  • [A-Za-z]* Optionally match a char A-Za-z
  • (?:\h+[A-Z][A-Za-z]*)* Optionally repeat 1+ horizontal whitespace chars followed by again an uppercase char A-Z and optional chars A-Za-z
  • $ End of string

Regex demo

In Java with the doubled backslashes

String regex = "^[A-Z](?=.{1,29}$)[A-Za-z]*(?:\\h+[A-Z][A-Za-z]*)*$";

Regex to validate hypen(-) at start and end of the string

To allow any character but only disallow hyphen at start and end:

^(?!-).*[^-]$
  • ^ start of string
  • (?!-) look ahead if there is no hyphen
  • .* match any amount of any character
  • [^-] match one character, that is not a hyphen
  • $ at the end

See demo at regex101

How can I validate field input against a regex when pasting?

A one in all solution subscribes to the input event. In addition to the sanitizing tasks the handler has to take care of reestablishing the input fields exact or most expected cursor/caret position in order to not break the user experience ...

function getSanitizedValue(value) {
return value.replace(/[aeiou]/ig, '');
}

function handleInput(evt) {
evt.preventDefault();

const elmNode = evt.currentTarget;

const currentValue = elmNode.value;
const sanitizedValue = getSanitizedValue(currentValue);

if (currentValue !== sanitizedValue) {
const diff = sanitizedValue.length - currentValue.length;
const { selectionStart, selectionEnd } = elmNode;

elmNode.value = sanitizedValue;

elmNode.selectionStart =
(selectionStart + diff > 0) ? selectionStart + diff : selectionStart;
elmNode.selectionEnd =
(selectionEnd + diff > 0) ? selectionEnd + diff : selectionEnd;
}
}

$('document')
.ready(() => {

$("#input").on("input", handleInput);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input type='text' id='input'/>


Related Topics



Leave a reply



Submit