Difference between regex [A-z] and [a-zA-Z]
[A-z]
will match ASCII characters in the range from A
to z
, while [a-zA-Z]
will match ASCII characters in the range from A
to Z
and in the range from a
to z
. At first glance, this might seem equivalent -- however, if you look at this table of ASCII characters, you'll see that A-z
includes several other characters. Specifically, they are [
, \
, ]
, ^
, _
, and `
(which you clearly don't want).
difference between /[a-z]/gi and /^[A-Za-z]+$/
Your two examples are quite different.
The first example (/^[A-Za-z]+$/
) matches only when all characters are in the set [A-Za-z]
.
The second example (/[a-z]/gi
) matches if only a single character is alphabetic.
I suspect you want /^[a-z]+$/i
:
/
^ # Matches the start of the string
[a-z]+ # Matches one or more lower case letters
$ # Matches the end of the string
/i # Case insensitive matching
Regular expression ^[a-zA-Z] or [^a-zA-Z]
Yes, the first means "match all strings that start with a letter", the second means "match all strings that contain a non-letter". The caret ("^") is used in two different ways, one to signal the start of the text, one to negate a character match inside square brackets.
Regex: Does /w means [a-zA-Z] or [a-zA-Z0-9_] as most tutorials mention \w -Matches the word characters?
Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,
\d A digit: [0-9]
\w A word character: [a-zA-Z_0-9]
So (\w|\d|_)
is equivalent to ([a-zA-Z_0-9]|[0-9]|_)
, where the extra underscore as well as \d
is redundant since it's included as part of \w
.
(\w|\d|_)
is equivalent to (\w)
Is [-a-z] the same as [a-zA-Z] in regular expression?
-a-z
would be the hyphen character (-) and any letter between a and z. But then they've got it again later in the expression by itself which is redundant.
Difference between [:alpha:] class and [a-zA-Z]; Is [:alpha:] OS independent?
[:alpha:]
stands for "alphabetic characters:".[:alpha:]
, as in the opposite to[:digit:]
. This includes literally every letter character in your character encoding. Whereas[a-zA-Z]
is capturing any character between the symbol 'a' and 'z', as well as 'A' and 'Z'. As @Charles Duffy noted the locale order of these can differ and so other characters can be contained. In standard English UTF-8, however, this will only include standard English letters (26 letters * 2 lower & upper case = 52), and thus will not include any letter from other languages, e.g., é, ö, ï, etc.[:alpha:]
will match all alphabetic characters.Yes, since
[:alpha:]
matches all alphabetic characters it will work the same across different languages, operations systems or locations.
To give more context, the regex
function implemented in R (used by grepl
, regexpr
, gregexpr
, sub
or gsub
, among others) follows the POSIX 1003.2 standard. This means matching is based on:
the bit pattern used for encoding the character, not on the graphic
representation of the character.
Below is an example of variations of different language characters for Sys.getlocale(category = "LC_ALL")
"en_GB.UTF-8":
fr_chr <- "Voix ambiguë d’un cœur qui au zéphyr préfère les jattes de kiwi."
ge_chr <- "Fix, Schwyz! quäkt Jürgen blöd vom Paß."
gr_chr <- "Ταχίστη αλώπηξ βαφής ψημένη γη, δρασκελίζει υπέρ νωθρού κυνός."
en_chr <- "Shaw, those twelve beige hooks are joined if I patch a young, gooey mouth."
cn_chr <- "敏捷的棕色狐狸跨过懒狗"
gsub("[[:alpha:]]","",fr_chr)
[1] " ’ ."
gsub("[[:alpha:]]","",ge_chr)
[1] ", ! ."
gsub("[[:alpha:]]","",gr_chr)
[1] " , ."
gsub("[[:alpha:]]","",en_chr)
[1] ", , ."
gsub("[[:alpha:]]","",cn_chr)
[1] ""
gsub("[A-Za-z]","",fr_chr)
[1] " ë ’ œ é éè ."
gsub("[A-Za-z]","",ge_chr)
[1] ", ! ä ü ö ß."
gsub("[A-Za-z]","",gr_chr)
[1] "Ταχίστη αλώπηξ βαφής ψημένη γη, δρασκελίζει υπέρ νωθρού κυνός."
gsub("[A-Za-z]","",en_chr)
[1] ", , ."
gsub("[A-Za-z]","",cn_chr)
[1] "敏捷的棕色狐狸跨过懒狗"
RegEx for matching A-Z, a-z, 0-9, _ and .
^[A-Za-z0-9_.]+$
From beginning until the end of the string, match one or more of these characters.
Edit:
Note that ^
and $
match the beginning and the end of a line. When multiline is enabled, this can mean that one line matches, but not the complete string.
Use \A
for the beginning of the string, and \z
for the end.
See for example: http://msdn.microsoft.com/en-us/library/h5181w5w(v=vs.110).aspx
Related Topics
How to Get the Current Date/Time in Java
Is There a Performance Difference Between a for Loop and a For-Each Loop
How to Import a Class from Default Package
Differencebetween Serializable and Externalizable in Java
What Is Suppresswarnings ("Unchecked") in Java
Round Up to 2 Decimal Places in Java
How to Search Google Programmatically Java API
Difference Between Null and Empty ("") Java String
Which Is More Efficient, a For-Each Loop, or an Iterator
Why Does Integer Division Code Give the Wrong Answer
What Is the Point of "Final Class" in Java
How to Gracefully Handle the Sigkill Signal in Java
Classnotfoundexception Com.Mysql.Jdbc.Driver
What Is Web-Inf Used for in a Java Ee Web Application