Difference Between Regex [A-Z] and [A-Za-Z]

Difference between regex [A-z] and [a-zA-Z]

[A-z] will match ASCII characters in the range from A to z, while [a-zA-Z] will match ASCII characters in the range from A to Z and in the range from a to z. At first glance, this might seem equivalent -- however, if you look at this table of ASCII characters, you'll see that A-z includes several other characters. Specifically, they are [, \, ], ^, _, and ` (which you clearly don't want).

difference between /[a-z]/gi and /^[A-Za-z]+$/

Your two examples are quite different.

The first example (/^[A-Za-z]+$/) matches only when all characters are in the set [A-Za-z].

The second example (/[a-z]/gi) matches if only a single character is alphabetic.

I suspect you want /^[a-z]+$/i:

/
^ # Matches the start of the string
[a-z]+ # Matches one or more lower case letters
$ # Matches the end of the string
/i # Case insensitive matching

Regular expression ^[a-zA-Z] or [^a-zA-Z]

Yes, the first means "match all strings that start with a letter", the second means "match all strings that contain a non-letter". The caret ("^") is used in two different ways, one to signal the start of the text, one to negate a character match inside square brackets.

Regex: Does /w means [a-zA-Z] or [a-zA-Z0-9_] as most tutorials mention \w -Matches the word characters?

Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,

\d  A digit: [0-9]
\w A word character: [a-zA-Z_0-9]

So (\w|\d|_) is equivalent to ([a-zA-Z_0-9]|[0-9]|_), where the extra underscore as well as \d is redundant since it's included as part of \w.

(\w|\d|_) is equivalent to (\w)

Is [-a-z] the same as [a-zA-Z] in regular expression?

-a-z would be the hyphen character (-) and any letter between a and z. But then they've got it again later in the expression by itself which is redundant.

Difference between [:alpha:] class and [a-zA-Z]; Is [:alpha:] OS independent?

  1. [:alpha:] stands for "alphabetic characters:". [:alpha:], as in the opposite to [:digit:]. This includes literally every letter character in your character encoding. Whereas [a-zA-Z] is capturing any character between the symbol 'a' and 'z', as well as 'A' and 'Z'. As @Charles Duffy noted the locale order of these can differ and so other characters can be contained. In standard English UTF-8, however, this will only include standard English letters (26 letters * 2 lower & upper case = 52), and thus will not include any letter from other languages, e.g., é, ö, ï, etc.

  2. [:alpha:] will match all alphabetic characters.

  3. Yes, since [:alpha:] matches all alphabetic characters it will work the same across different languages, operations systems or locations.

To give more context, the regex function implemented in R (used by grepl, regexpr, gregexpr, sub or gsub, among others) follows the POSIX 1003.2 standard. This means matching is based on:

the bit pattern used for encoding the character, not on the graphic
representation of the character.

Below is an example of variations of different language characters for Sys.getlocale(category = "LC_ALL") "en_GB.UTF-8":

fr_chr <- "Voix ambiguë d’un cœur qui au zéphyr préfère les jattes de kiwi."
ge_chr <- "Fix, Schwyz! quäkt Jürgen blöd vom Paß."
gr_chr <- "Ταχίστη αλώπηξ βαφής ψημένη γη, δρασκελίζει υπέρ νωθρού κυνός."
en_chr <- "Shaw, those twelve beige hooks are joined if I patch a young, gooey mouth."
cn_chr <- "敏捷的棕色狐狸跨过懒狗"

gsub("[[:alpha:]]","",fr_chr)
[1] " ’ ."
gsub("[[:alpha:]]","",ge_chr)
[1] ", ! ."
gsub("[[:alpha:]]","",gr_chr)
[1] " , ."
gsub("[[:alpha:]]","",en_chr)
[1] ", , ."
gsub("[[:alpha:]]","",cn_chr)
[1] ""

gsub("[A-Za-z]","",fr_chr)
[1] " ë ’ œ é éè ."
gsub("[A-Za-z]","",ge_chr)
[1] ", ! ä ü ö ß."
gsub("[A-Za-z]","",gr_chr)
[1] "Ταχίστη αλώπηξ βαφής ψημένη γη, δρασκελίζει υπέρ νωθρού κυνός."
gsub("[A-Za-z]","",en_chr)
[1] ", , ."
gsub("[A-Za-z]","",cn_chr)
[1] "敏捷的棕色狐狸跨过懒狗"

RegEx for matching A-Z, a-z, 0-9, _ and .

^[A-Za-z0-9_.]+$

From beginning until the end of the string, match one or more of these characters.

Edit:

Note that ^ and $ match the beginning and the end of a line. When multiline is enabled, this can mean that one line matches, but not the complete string.

Use \A for the beginning of the string, and \z for the end.

See for example: http://msdn.microsoft.com/en-us/library/h5181w5w(v=vs.110).aspx



Related Topics



Leave a reply



Submit