Matching a Space in Regex

Matching a space in regex

If you're looking for a space, that would be " " (one space).

If you're looking for one or more, it's " *" (that's two spaces and an asterisk) or " +" (one space and a plus).

If you're looking for common spacing, use "[ X]" or "[ X][ X]*" or "[ X]+" where X is the physical tab character (and each is preceded by a single space in all those examples).

These will work in every* regex engine I've ever seen (some of which don't even have the one-or-more "+" character, ugh).

If you know you'll be using one of the more modern regex engines, "\s" and its variations are the way to go. In addition, I believe word boundaries match start and end of lines as well, important when you're looking for words that may appear without preceding or following spaces.

For PHP specifically, this page may help.

From your edit, it appears you want to remove all non valid characters The start of this is (note the space inside the regex):

$newtag = preg_replace ("/[^a-zA-Z0-9 ]/", "", $tag);
# ^ space here

If you also want trickery to ensure there's only one space between each word and none at the start or end, that's a little more complicated (and probably another question) but the basic idea would be:

$newtag = preg_replace ("/ +/", " ", $tag); # convert all multispaces to space
$newtag = preg_replace ("/^ /", "", $tag); # remove space from start
$newtag = preg_replace ("/ $/", "", $tag); # and end

Regular expression to allow spaces between words


tl;dr

Just add a space in your character class.

^[a-zA-Z0-9_ ]*$

 


Now, if you want to be strict...

The above isn't exactly correct. Due to the fact that * means zero or more, it would match all of the following cases that one would not usually mean to match:

  • An empty string, "".
  • A string comprised entirely of spaces, "      ".
  • A string that leads and / or trails with spaces, "   Hello World  ".
  • A string that contains multiple spaces in between words, "Hello   World".

Originally I didn't think such details were worth going into, as OP was asking such a basic question that it seemed strictness wasn't a concern. Now that the question's gained some popularity however, I want to say...

...use @stema's answer.

Which, in my flavor (without using \w) translates to:

^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$

(Please upvote @stema regardless.)

Some things to note about this (and @stema's) answer:

  • If you want to allow multiple spaces between words (say, if you'd like to allow accidental double-spaces, or if you're working with copy-pasted text from a PDF), then add a + after the space:

    ^\w+( +\w+)*$
  • If you want to allow tabs and newlines (whitespace characters), then replace the space with a \s+:

    ^\w+(\s+\w+)*$

    Here I suggest the + by default because, for example, Windows linebreaks consist of two whitespace characters in sequence, \r\n, so you'll need the + to catch both.

Still not working?

Check what dialect of regular expressions you're using.* In languages like Java you'll have to escape your backslashes, i.e. \\w and \\s. In older or more basic languages and utilities, like sed, \w and \s aren't defined, so write them out with character classes, e.g. [a-zA-Z0-9_] and [\f\n\p\r\t], respectively.

 


* I know this question is tagged vb.net, but based on 25,000+ views, I'm guessing it's not only those folks who are coming across this question. Currently it's the first hit on google for the search phrase, regular expression space word.

Python regex match space only

No need for special groups. Just create a regex with a space character. The space character does not have any special meaning, it just means "match a space".

RE = re.compile(' +')

So for your case

a='rasd\nsa sd'
print(re.search(' +', a))

would give

<_sre.SRE_Match object; span=(7, 8), match=' '>

How can I match spaces with a regexp in Bash?

Replace:

regexp="templateUrl:[\s]*'"

With:

regexp="templateUrl:[[:space:]]*'"

According to man bash, the =~ operator supports "extended regular expressions" as defined in man 3 regex. man 3 regex says it supports the POSIX standard and refers the reader to man 7 regex. The POSIX standard supports [:space:] as the character class for whitespace.

The GNU bash manual documents the supported character classes as follows:

Within ‘[’ and ‘]’, character classes can be specified using the
syntax [:class:], where class is one of the following classes defined
in the POSIX standard:

alnum alpha ascii blank cntrl digit graph lower print

punct space upper word xdigit

The only mention of \s that I found in the GNU bash documentation was for an unrelated use in prompts, such as PS1, not in regular expressions.

The Meaning of *

[[:space:]] will match exactly one white space character. [[:space:]]* will match zero or more white space characters.

The Difference Between space and blank

POSIX regular expressions offer two classes of whitespace: [[:space:]] and [[:blank:]]:

  • [[:blank:]] means space and tab. This makes it similar to: [ \t].

  • [[:space:]], in addition to space and tab, includes newline, linefeed, formfeed, and vertical tab. This makes it similar to: [ \t\n\r\f\v].

A key advantage of using character classes is that they are safe for unicode fonts.

Regex to match white spaces before certain characters with some exceptions

You can use

([ ](?:[,.!?;]|:(?!\/)))

Or,

(?! :\/)( [,.!?;:])

See the regex demo #1 and regex demo #2.

Details:

  • [ ] - a space (note the brackets are not useful here unless you have free spacing mode on (and not in Java)
  • (?:[,.!?;]|:(?!\/)) - either of
    • [,.!?;] - one of the chars in the set
    • | - or
    • :(?!\/) - a colon not immediately followed with a slash.

Regex #2 details

  • (?! :\/) - a negative lookahead that fails the match if there is a space, then a colon and then a / immediately to the right of the current location
  • ( [,.!?;:]) - Group 1: a space and then a char from the set.

Regular expression matching and remove spaces

The pattern (?<=Address(\s))(.*(?=\s)) that you tried asserts Address followed by a single whitespace char to the left, and then matches the rest of the line asserting a whitespace char to the right.

For the example data, that will match right before the last whitespace char in the string, and the match will also contain all the whitespace chars that are present right after Address


One option to match the bold parts in the question is to use a capture group.

\bAddress\s+([^,]+,\s*\S+)

The pattern matches:

  • \bAddress\s+ Match Address followed by 1+ whitespace chars
  • ( Capture group 1
    • [^,]+, Match 1+ occurrences of any char except , and then match ,
  • \s*\S+ Match optional whitespace chars followed by 1+ non whitespace chars
  • ) Close group 1

.NET regex demo (Click on the Table tab to see the value for group 1)

Note that \s and [^,] can also match a newline

A variant with a positive lookbehind to get a match only:

(?<=\bAddress\s+)[^,\s][^,]+,\s*\S+

.NET Regex demo



Related Topics



Leave a reply



Submit