PHP Regex for Human Names

PHP Regex for human names

I would really say : don't try to validate names : one day or another, your code will meet a name that it thinks is "wrong"... And how do you think one would react when an application tells him "your name is not valid" ?

Depending on what you really want to achieve, you might consider using some kind of blacklist / filters, to exclude the "not-names" you thought about : it will maybe let some "bad-names" pass, but, at least, it shouldn't prevent any existing name from accessing your application.

Here are a few examples of rules that come to mind :

  • no number
  • no special character, like "~{()}@^$%?;:/*§£ø and probably some others
  • no more that 3 spaces ?
  • none of "admin", "support", "moderator", "test", and a few other obvious non-names that people tend to use when they don't want to type in their real name...

    • (but, if they don't want to give you their name, their still won't, even if you forbid them from typing some random letters, they could just use a real name... Which is not their's)

Yes, this is not perfect ; and yes, it will let some non-names pass... But it's probably way better for your application than saying someone "your name is wrong" (yes, I insist ^^ )


And, to answer a comment you left under one other answer :

I could just forbid the most command
characters for SQL injection and XSS
attacks,

About SQL Injection, you must escape your data before sending those to the database ; and, if you always escape those data (you should !), you don't have to care about what users may input or not : as it is escaped, always, there is no risk for you.

Same about XSS : as you always escape your data when ouputting it (you should !), there is no risk of injection ;-)


EDIT : if you just use that regex like that, it will not work quite well :

The following code :

$rexSafety = "/^[^<,\"@/{}()*$%?=>:|;#]*$/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}

Will get you at least a warning :

Warning: preg_match() [function.preg-match]: Unknown modifier '{'

You must escape at least some of those special chars ; I'll let you dig into PCRE Patterns for more informations (there is really a lot to know about PCRE / regex ; and I won't be able to explain it all)

If you actually want to check that none of those characters is inside a given piece of data, you might end up with something like that :

$rexSafety = "/[\^<,\"@\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}

(This is a quick and dirty proposition, which has to be refined!)

This one says "OK" (well, I definitly hope my own name is ok!)
And the same example with some specials chars, like this :

$rexSafety = "/[\^<,\"@\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'ma{rtin')) {
var_dump('bad name');
} else {
var_dump('ok');
}

Will say "bad name"

But please note I have not fully tested this, and it probably needs more work ! Do not use this on your site unless you tested it very carefully !


Also note that a single quote can be helpful when trying to do an SQL Injection... But it is probably a character that is legal in some names... So, just excluding some characters might no be enough ;-)

Regex for people names PHP

The PHP preg_ functions need to start with /^ and end with $/.

Try this:

preg_match("/^[a-zA-Z-'\s]+$/", $value);

Regex for names

  • Hyphenated Names (Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

^[A-Z][-a-zA-Z]+$
  • Names with Apostophies (D'Angelo)

A naive way of doing this would be as above, giving:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.

Here I'd be tempted to just do our naive way again:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

  • Joint Names (Ben & Jerry)

At this point you're not looking at single names anymore?

Anyway, as you can see, regexes have a habit of growing very quickly...

Regex to validate names

Is there an easy way to check for correct human names?

This has been discussed several times. I'm fairly certain that the only thing that people can agree on is that in order to exist a name cannot be a empty string, thus:

^.+$

(Yes, I am aware that this is probably not what OP is looking for. I'm just summarizing earlier Q&As.)

How to Validate Human Names in CakePHP?

I agree with the other comments that validating a name is probably a bad idea.

For virtually everything you can think of to validate, there will be someone with a name that breaks your rule. If you're happy with the idea that you're going to be blocking real people from entering their names, then you can validate it as much as you like. But the more validation rules you put in, the more likely you are to find a real person who can't sign in.

Here's a link to a page which describes some of the obvious (and not so obvious) things which people try to validate, which can trip them up:

http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

If you want to allow anybody onto your site, then the best you can really hope for is to force a maximum field length to fit the space you've allocated in your database. Even then you're going to annoy someone.

Using a Regular Expression in PHP to validate a name

You can invalidate names containing a sequence of two or more of the characters hyphen and apostrophe by using a negative lookahead:

(?!.*['-]{2})

For example

$names = array('Mike Cannon-Brookes', "Bill O'Hara-Jones", "Jane O'-Reilly", "Mary Smythe-'Fawkes");
foreach ($names as $name) {
$name_valid = preg_match("/^(?!.*['-]{2})[a-zA-Z][a-zA-Z'\s-]{1,20}$/", $name);
echo "$name is " . (($name_valid) ? "valid" : "not valid") . "\n";
}

Output:

Mike Cannon-Brookes is valid
Bill O'Hara-Jones is valid
Jane O'-Reilly is not valid
Mary Smythe-'Fawkes is not valid

Demo on 3v4l.org

PHP regex to check a English name

If the rules you've stated are actually what you want, the following would work:

/^(?:[A-Za-z]+(?:\s+|$)){2,3}$/

However, there are quite a few real-life names that don't fit in these rules, like "O'Malley", "Jo-Jo", etc.

You can of course extend this regex fairly easily to allow other characters - just add them into the brackets. For instance, to allow apostrophes and dashes, you could use [A-Za-z'-] (the - has to be at the end, or it will be interpreted as a range).

Regex for names with special characters (Unicode)

Try the following regular expression:

^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$

In PHP this translates to:

if (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0)
{
// valid
}

You should read it like this:

^   # start of subject
(?: # match this:
[ # match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s # any kind of space
[ #match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s? # any kind of space (0 or more times)
)+ # one or more times
$ # end of subject

I honestly don't know how to port this to Javascript, I'm not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:

$names = array
(
'Alix',
'André Svenson',
'H4nn3 Andersen',
'Hans',
'John Elkjærd',
'Kristoffer la Cour',
'Marco d\'Almeida',
'Martin Henriksen!',
);

foreach ($names as $name)
{
echo sprintf('%s is %s' . "\n", $name, (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0) ? 'valid' : 'invalid');
}

I'm sorry I can't help you regarding the Javascript part but probably someone here will.


Validates:

  • John Elkjærd
  • André Svenson
  • Marco d'Almeida
  • Kristoffer la Cour

Invalidates:

  • Hans
  • H4nn3 Andersen
  • Martin Henriksen!

To replace invalid characters, though I'm not sure why you need this, you just need to change it slightly:

$name = preg_replace('~[^\p{L}\p{Mn}\p{Pd}\'\x{2019}\s]~u', '$1', $name);

Examples:

  • H4nn3 Andersen -> Hnn Andersen
  • Martin Henriksen! -> Martin Henriksen

Note that you always need to use the u modifier.

regex to validate first name excluding @()&

If your intention was to allow special characters (other than those four) anywhere in the string, then your pattern is wrong.

I'll break down your pattern to walk you through what it does:

  • ^ - The match must begin at the start of a line (or entire string).
  • [^0-9\@\(\)\&] - Match any single character which is not a number, an @, a parenthesis, or an ampersand. I'm pretty sure the slashes here are superfluous, by the way. The ones before the @ and & characters almost certainly are, since those characters aren't ever special inside regexes. The ones before the ( and ) might be needed, since those characters are the subpattern delimiters, but I think they're still unneeded here since they're inside a character class.
  • [a-zA-Z\s]* - Match any lower or uppercase character between A and Z, or any whitespace character, like a space (this is what \s does). The * means you can match as many of these characters as there are in a row, or no characters if none of them exist in this position.
  • $ - The match must end at the end of the line (or entire string).

In short, you're only excluding those four special characters from the first character of your string, but you're exluding all special characters as any character after the first.

If you want to allow any character, except those four, in any position in the string, then you should use this as your pattern:

/^[^0-9@&()]*$/

With all of that said, I think you might be overcomplicating things a bit. It's sort of a matter of opinion, but I try to only use regular expressions when there is no other way to do something, since they can be a bit hard to read (this question is a good example of that).

What I would suggest is that you just use str_replace to remove the four characters you're disallowing, and check the resultant string against your original input:

if($input === str_replace(array('@', '&', '(', ')'), '', $input) {
// process valid input
} else {
// handle invalid input
}

The str_replace call will take your original string and replace any value in the search array, array('@', '&', '(', ')'), and remove it (technically, "replace" it with nothing). If the two strings match after that, then none of the invalid characters were present, and your input is valid.

Since you're using parentheses as items within the array, it might be more readable to separate the elements onto their own lines:

$chars_to_remove = array(
'@',
'&',
'(',
')'
);
if ($input === str_replace($chars_to_replace, '', $input)) {
// process valid input
} else {
// handle invalid input
}


Related Topics



Leave a reply



Submit