Regex For Names

Regex for names


  • Hyphenated Names (Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

^[A-Z][-a-zA-Z]+$
  • Names with Apostophies (D'Angelo)

A naive way of doing this would be as above, giving:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.

Here I'd be tempted to just do our naive way again:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

  • Joint Names (Ben & Jerry)

At this point you're not looking at single names anymore?

Anyway, as you can see, regexes have a habit of growing very quickly...

Regular Expression for First name

You may use

^(?=.{1,40}$)[a-zA-Z]+(?:[-'\s][a-zA-Z]+)*$

See the regex demo.

Details

  • ^ - a start of a string
  • (?=.{1,40}$) - there must be 1 to 40 chars other than line break chars in the string
  • [a-zA-Z]+ - 1 or more ASCII letters
  • (?: - starto of a non-capturing group repeated 0 or more times matching sequences of

    • [-'\s] - a -, ' or whitespace
    • [a-zA-Z]+ - 1+ ASCII letters
  • )* - end of the grouping
  • $ - end of string

Regex that flags full name + optional middle names with NO symbols

For English only names you can use this:

^[a-zA-Z]+(?:\s[a-zA-Z]+)+$

Explanation:

  • ^ - anchor at start of string
  • [a-zA-Z]+ - 1+ alpha chars
  • (?: - non-capturing group start
  • \s - a single space
  • [a-zA-Z]+ - 1+ alpha chars
  • )+ - non-capturing group end, repeated 1+ times
  • $ - anchor at end of string

If you want to support international names with marks (accents etc), such as Jörg Müller, or 橋本明美, but exclude special chars, use this:

/^[\p{L}\p{M}]+(?:\p{Zs}[\p{L}\p{M}]+)+$/u

Explanation:

  • ^ - anchor at start of string
  • [\p{L}\p{M}]+ - letters in all languages
    • \p{L} - denotes a letter char in any language
    • \p{M} - denotes a mark (accent etc)
  • (?: - non-capturing group start
  • \p{Zs} - denotes a single space char, such as regular space and Japanese space char  
  • [\p{L}\p{M}]+ - letters in all languages
  • )+ - non-capturing group end, repeated 1+ times
  • $ - anchor at end of string
  • add the u flag for Unicode support

See docs on Unicode regex: https://javascript.info/regexp-unicode

Regular expression for a name that can contain not more than one space character

The pattern ^(?![\s]+$)(?![\s]{2,})[a-zA-Z\s]{2,25}$ that you tried matches:

  • Assert that the string does not consist of 1 or more whitespace chars ^(?![\s]+$)
  • Asserts not 2 or more whitespace chars at the beginning (?![\s]{2,})
  • Match 2-25 chars being either a-zA-Z or a whitespace char [a-zA-Z\s]{2,25}$

There is no restriction to match a single space in the whole string

Note that \s could also match a newline or a tab.


What you could do is assert 2-25 characters in the string.

Then match 1+ chars a-zA-Z and optionally match a single space and 1+ chars a-zA-Z

^(?=.{2,25}$)[a-zA-Z]+(?: [a-zA-Z]+)?$

The pattern matches:

  • ^ Start of string
  • (?=.{2,25}$) Positive lookahead, assert 2-25 chars in the string
  • [a-zA-Z]+ Match 1+ chars A-Za-z
  • (?: [a-zA-Z]+)? Optionally match and again 1+ chars from the character class
  • $ End of string

See a regex demo.

Regex Names which have received a B

You can use

\b([^:\n]*):\s*B

See the regex demo. Details:

  • \b - a word boundary
  • ([^:\n]*) - Group 1: any zero or more chars other than : and line feed
  • : - a colon
  • \s* - zero or more whitespaces
  • B - a B char.

See the Python demo:

import re
# Make sure you use
# with open(fpath, 'r') as f: text = f.read()
# for this to work if you read the data from a file
text = """John SMith: A
Pedro Smith: B
Jonathan B: A
John B: B
Luis Diaz: A
Scarlet Diaz: B"""
print( re.findall(r'\b([^:\n]*):\s*B', text) )
# => ['Pedro Smith', 'John B', 'Scarlet Diaz']

C#: Regex for name with 2 and/or 3 words

Another idea to use a quantified group.

^(?:[A-Z]{2,15} ?\b){2,3}$
  • use with your RegexOptions.IgnoreCase
  • the \b stands for word boundary.

See demo at regex101

RegEx for matching first names followed by no space

You can simply bound your expression with an end $ char and that would suffice:

([A-Z][a-z]+)$

If you wish to add more boundary, you can also bound it with a start ^ char:

^([A-Z][a-z]+)$

You can also remove the capturing group, if you want, and it would still match.

[A-Z][a-z]+$

RegEx

You can modify/change your expressions in regex101.com.

Sample Image

RegEx Circuit

You can visualize your expressions in jex.im:

Sample Image

JavaScript Demo





const regex = /([A-Z][a-z]+)$/gm;
const str = `Kevin`;
const subst = `$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Find the first name that starts with any letter than S using regex

The first rule of writing regular expressions in a shell script (or at the terminal) is "enclose the regular expression in single quotes" so that the shell doesn't try to interpret the metacharacters in the regex. You might sometimes use double quotes instead of single quotes if you need to match single quotes but not double quotes or if you need to interpolate a variable, but aim to use single quotes. Also, avoid UUoC — Useless Use of cat.

Your question currently shows two regular expressions:

^[S\w][,]?[' ']?[A-RT-Z]?

cat People.txt | grep -E ^[S\w][,]?[' ']?[P\w+]?

If written as suggested, these would become:

grep -E -e '^[Sw],? ?[A-RT-Z]?' People.txt
grep -E -e '^[Sw],? ?[Pw+]?' People.txt

The shell removes the backslashes in your rendition. The + in the character class matches a plus sign. You don't need square brackets around the comma (though they do no major harm). I use the -e option for explicitness, and so I can add extra arguments after the regex (-w or -l or -n or …) when editing commands via history. (I also dislike having options recognized after non-option arguments; I often run with $POSIXLY_CORRECT set in my environment. That's a personal quirk.)

The first of the two commands looks for a line starting S or w, followed by an optional comma, an optional blank, and an optional upper-case letter other than S. The second is similar except that it looks for an optional P or w. None of this bears much relationship to the question.

You need an expression more like one of these:

grep -E -e '^[S][[:alpha:]]*, [^S]' People.txt
grep -E -e '^[S][a-zA-Z]*, [^S]' People.txt

These allow single-character names — just S — but you can use + instead of * to require one or more letters.

There are lots of refinements possible, depending on how much you want to work, but this does the primary job of finding 'first word on the line starts with S, and is followed by a comma, a blank, and the second word does not start with S'.

Given a file People.txt containing:

Randall, Steven
Rogers, Timothy
Schmidt, Paul
Sells, Simon
Smith, Peter
Stephens, Sheila
Titus, Persephone
Williams, Shirley
Someone
S

Your regular expressions produce the output:

Schmidt, Paul
Sells, Simon
Smith, Peter
Stephens, Sheila
Someone
S

My commands produce:

Schmidt, Paul
Smith, Peter


Related Topics



Leave a reply



Submit