Matching a space in regex
If you're looking for a space, that would be " "
(one space).
If you're looking for one or more, it's " *"
(that's two spaces and an asterisk) or " +"
(one space and a plus).
If you're looking for common spacing, use "[ X]"
or "[ X][ X]*"
or "[ X]+"
where X
is the physical tab character (and each is preceded by a single space in all those examples).
These will work in every* regex engine I've ever seen (some of which don't even have the one-or-more "+"
character, ugh).
If you know you'll be using one of the more modern regex engines, "\s"
and its variations are the way to go. In addition, I believe word boundaries match start and end of lines as well, important when you're looking for words that may appear without preceding or following spaces.
For PHP specifically, this page may help.
From your edit, it appears you want to remove all non valid characters The start of this is (note the space inside the regex):
$newtag = preg_replace ("/[^a-zA-Z0-9 ]/", "", $tag);
# ^ space here
If you also want trickery to ensure there's only one space between each word and none at the start or end, that's a little more complicated (and probably another question) but the basic idea would be:
$newtag = preg_replace ("/ +/", " ", $tag); # convert all multispaces to space
$newtag = preg_replace ("/^ /", "", $tag); # remove space from start
$newtag = preg_replace ("/ $/", "", $tag); # and end
Regular expression to allow spaces between words
tl;dr
Just add a space in your character class.
^[a-zA-Z0-9_ ]*$
Now, if you want to be strict...
The above isn't exactly correct. Due to the fact that *
means zero or more, it would match all of the following cases that one would not usually mean to match:
- An empty string, "".
- A string comprised entirely of spaces, " ".
- A string that leads and / or trails with spaces, " Hello World ".
- A string that contains multiple spaces in between words, "Hello World".
Originally I didn't think such details were worth going into, as OP was asking such a basic question that it seemed strictness wasn't a concern. Now that the question's gained some popularity however, I want to say...
...use @stema's answer.
Which, in my flavor (without using \w
) translates to:
^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$
(Please upvote @stema regardless.)
Some things to note about this (and @stema's) answer:
If you want to allow multiple spaces between words (say, if you'd like to allow accidental double-spaces, or if you're working with copy-pasted text from a PDF), then add a
+
after the space:^\w+( +\w+)*$
If you want to allow tabs and newlines (whitespace characters), then replace the space with a
\s+
:^\w+(\s+\w+)*$
Here I suggest the
+
by default because, for example, Windows linebreaks consist of two whitespace characters in sequence,\r\n
, so you'll need the+
to catch both.
Still not working?
Check what dialect of regular expressions you're using.* In languages like Java you'll have to escape your backslashes, i.e. \\w
and \\s
. In older or more basic languages and utilities, like sed
, \w
and \s
aren't defined, so write them out with character classes, e.g. [a-zA-Z0-9_]
and [\f\n\p\r\t]
, respectively.
* I know this question is tagged vb.net, but based on 25,000+ views, I'm guessing it's not only those folks who are coming across this question. Currently it's the first hit on google for the search phrase, regular expression space word.
Python regex match space only
No need for special groups. Just create a regex with a space character. The space character does not have any special meaning, it just means "match a space".
RE = re.compile(' +')
So for your case
a='rasd\nsa sd'
print(re.search(' +', a))
would give
<_sre.SRE_Match object; span=(7, 8), match=' '>
How can I match spaces with a regexp in Bash?
Replace:
regexp="templateUrl:[\s]*'"
With:
regexp="templateUrl:[[:space:]]*'"
According to man bash
, the =~
operator supports "extended regular expressions" as defined in man 3 regex
. man 3 regex
says it supports the POSIX standard and refers the reader to man 7 regex
. The POSIX standard supports [:space:]
as the character class for whitespace.
The GNU bash
manual documents the supported character classes as follows:
Within ‘[’ and ‘]’, character classes can be specified using the
syntax [:class:], where class is one of the following classes defined
in the POSIX standard:alnum alpha ascii blank cntrl digit graph lower print
punct space upper word xdigit
The only mention of \s
that I found in the GNU bash
documentation was for an unrelated use in prompts, such as PS1
, not in regular expressions.
The Meaning of *
[[:space:]]
will match exactly one white space character. [[:space:]]*
will match zero or more white space characters.
The Difference Between space
and blank
POSIX regular expressions offer two classes of whitespace: [[:space:]]
and [[:blank:]]
:
[[:blank:]]
means space and tab. This makes it similar to:[ \t]
.[[:space:]]
, in addition to space and tab, includes newline, linefeed, formfeed, and vertical tab. This makes it similar to:[ \t\n\r\f\v]
.
A key advantage of using character classes is that they are safe for unicode fonts.
Regex to match white spaces before certain characters with some exceptions
You can use
([ ](?:[,.!?;]|:(?!\/)))
Or,
(?! :\/)( [,.!?;:])
See the regex demo #1 and regex demo #2.
Details:
[ ]
- a space (note the brackets are not useful here unless you have free spacing mode on (and not in Java)(?:[,.!?;]|:(?!\/))
- either of[,.!?;]
- one of the chars in the set|
- or:(?!\/)
- a colon not immediately followed with a slash.
Regex #2 details
(?! :\/)
- a negative lookahead that fails the match if there is a space, then a colon and then a/
immediately to the right of the current location( [,.!?;:])
- Group 1: a space and then a char from the set.
Regular expression matching and remove spaces
The pattern (?<=Address(\s))(.*(?=\s))
that you tried asserts Address followed by a single whitespace char to the left, and then matches the rest of the line asserting a whitespace char to the right.
For the example data, that will match right before the last whitespace char in the string, and the match will also contain all the whitespace chars that are present right after Address
One option to match the bold parts in the question is to use a capture group.
\bAddress\s+([^,]+,\s*\S+)
The pattern matches:
\bAddress\s+
Match Address followed by 1+ whitespace chars(
Capture group 1[^,]+,
Match 1+ occurrences of any char except,
and then match,
\s*\S+
Match optional whitespace chars followed by 1+ non whitespace chars)
Close group 1
.NET regex demo (Click on the Table tab to see the value for group 1)
Note that \s
and [^,]
can also match a newline
A variant with a positive lookbehind to get a match only:
(?<=\bAddress\s+)[^,\s][^,]+,\s*\S+
.NET Regex demo
Related Topics
Sort and Display Directory List Alphabetically Using Opendir() in PHP
Detect Exact Os Version from Browser
PHP, MySQL Error: Column Count Doesn't Match Value Count At Row 1
Use One Bind_Param() With Variable Number of Input Vars
MySQL_Fetch_Array, MySQL_Fetch_Assoc, MySQL_Fetch_Object
How to Use MySQLi_Fetch_Array() Twice
Get Difference Between Associative Rows of Two 2-Dimensional Arrays
Shortcomings of MySQL_Real_Escape_String
Insert Query on Page Load, Inserts Twice
How to Convert Many Statement MySQL to Laravel Eloquent
Create a Folder If It Doesn't Already Exist
Upstream Sent Too Big Header While Reading Response Header from Upstream
Add a Custom Attribute to a Laravel/Eloquent Model on Load
Path of Assets in CSS Files in Symfony 2
What Is Causing "Unable to Allocate Memory For Pool" in PHP
Warning: Cannot Modify Header Information - Headers Already Sent by Error