In regex, what does [\w*] mean?
Quick answer: ^[\w*]$
will match a string consisting of a single character, where that character is alphanumeric (letters, numbers) an underscore (_
) or an asterisk (*
).
Details:
- The "
\w
" means "any word character" which usually means alphanumeric (letters, numbers, regardless of case) plus underscore (_) - The "
^
" "anchors" to the beginning of a string, and the "$
" "anchors" To the end of a string, which means that, in this case, the match must start at the beginning of a string and end at the end of the string. - The
[]
means a character class, which means "match any character contained in the character class".
It is also worth mentioning that normal quoting and escaping rules for strings make it very difficult to enter regular expressions (all the backslashes would need to be escaped with additional backslashes), so in Python there is a special notation which has its own special quoting rules that allow for all of the backslashes to be interpreted properly, and that is what the "r
" at the beginning is for.
Note: Normally an asterisk (*
) means "0 or more of the previous thing" but in the example above, it does not have that meaning, since the asterisk is inside of the character class, so it loses its "special-ness".
For more information on regular expressions in Python, the two official references are the re module, the Regular Expression HOWTO.
What does this pattern (? =\w)\W+(?=\w) mean in a Python regular expression?
Here's a breakdown of the elements:
\w
means an alphanumeric character\W+
is the opposite of\w
; with the+
it means one or more non-alphanumeric characters?<=
is called a "lookbehind assertion"?=
is a "lookahead assertion"
So this re.sub
statement means "if there are one or more non-alphanumeric characters with an alphanumeric character before and after, replace the non-alphanumeric character(s) with a space".
And by the way, the third argument to re.sub
must be a string (or bytes-like object); it can't be a list.
Regex: Does /w means [a-zA-Z] or [a-zA-Z0-9_] as most tutorials mention \w -Matches the word characters?
Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,
\d A digit: [0-9]
\w A word character: [a-zA-Z_0-9]
So (\w|\d|_)
is equivalent to ([a-zA-Z_0-9]|[0-9]|_)
, where the extra underscore as well as \d
is redundant since it's included as part of \w
.
(\w|\d|_)
is equivalent to (\w)
How to interpret this regular expression /[\W_]/g
/
... /g
It's a global regex. So it'll operate on multiple matches in the string.[
... ]
This creates a character set. Basically it'll match any single character within the listed set of characters.\W_
This matches the inverse of "word characters" and underscores. Any non-word character.
Then you have a few one off replacements for comma and period. Honestly, if that's the complete code, /[\W_,.]/g
, omitting the two other replaces, would work just as well.
What is the meaning of [\w\-] regular expression in PHP
Regex 101
\w explained
\w match any word character [a-zA-Z0-9_]
\w\- explained
\w\-
\w match any word character [a-zA-Z0-9_]
\- matches the character - literally
Matching Email Addresses Simple, not future proof
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b
Difference between \w and \b regular expression meta characters
The metacharacter \b
is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is
a word character. - After the last character in the string, if the
last character is a word character. - Between two characters in the
string, where one is a word character and the other is not a word character.
Simply put: \b
allows you to perform a "whole words only" search using a regular expression in the form of \bword\b
. A "word character" is a character that can be used to form words. All characters that are not "word characters" are "non-word characters".
In all flavors, the characters [a-zA-Z0-9_]
are word characters. These are also matched by the short-hand character class \w
. Flavors showing "ascii" for word boundaries in the flavor comparison recognize only these as word characters.
\w
stands for "word character", usually [A-Za-z0-9_]
. Notice the inclusion of the underscore and digits.
\B
is the negated version of \b
. \B
matches at every position where \b
does not. Effectively, \B
matches at any position between two word characters as well as at any position between two non-word characters.
\W
is short for [^\w]
, the negated version of \w
.
Related Topics
I Can't Install Rmagick Gem on Windows
Rspec Testing Redirect to Url with Get Params
Print All Method Names of a Class in Ruby
Rails: Pg::Insufficientprivilege: Error: Permission Denied for Relation Schema_Migrations
How to Test for a Redirect with Rspec and Capybara
How to Generate PDF from Markdown Using Pure Ruby
Detect Rspec Test Failure on After Each Method
Ruby on Rails, Paperclip, Amazon Aws S3 & Heroku
Rails Sort Tags by Most Used (Tag.Posts.Count)
How to Make Users Automatically Follow Admin User on Sign Up
Run Rails Commands Outside of Console
Define_Method: How to Dynamically Create Methods with Arguments
Differencebetween Ruby's Send and Public_Send Methods
Backspace and Arrow Keys Aren't Working in Irb(Git Bash Console) on Windows MAChine