Regular Expressions: How to Express \W Without Underscore

Regular Expressions: How to Express \w Without Underscore

the following character class (in Perl)

[^\W_]

\W is the same as [^\w]

regex: \w EXCEPT underscore (add to class and then exclude from class)

I have two options.

  1. [^\W_]

    This is very effective and does exactly what you want. It's also straightforward.

  2. With regex: [[\w]--[_]], note you need "V1" flag set, so you need

    r = regex.compile(r"(?V1)[\w--_]")

    or

    r = regex.compile(r"[\w--_]", flags=regex.V1)

    This looks better (readability) IMO if you're familiar with Matthew Barnett's regex module, which is more powerful than Python's stock re.

how To change my regex to reject underscores

Your regex is:

^[\\w\\-\\ \\#\\.\\/]{0,70}$

It is using \w which is equivalent of [a-zA-Z0-9_], hence it allows underscore also.

You can change your character class to this:

^[-#. a-zA-Z0-9\\/]{0,70}$

Note that space, dot, #, / don't need to be escaped inside [...] and - if placed at first or last position doesn't require escaping either.

I want to Capture a alphanumeric group without underscore

You can try this,

[^a-zA-Z0-9()\\s+]

The output will be reverse(abc)

regex match words without two underscores next to each other

You could use

\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)

And use the first group, see a demo on regex101.com.


In Python this could be

import re

rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')

words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']

nwords = [match.group(1)
for word in words
for match in [rx.search(word)]
if match and match.group(1) is not None]

print(nwords)
# ['ad', '12354', 'test']

Or within a string:

import re

rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')

string = "a__a 123dfgkjdflg4_ ad 12354 1246asd__ test__test test"

nwords = filter(None, rx.findall(string))
print(nwords)
# ['ad', '12354', 'test']


Note that you can do all of this without a regular expression (probably faster and with less headaches):

words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']

nwords = [word
for word in words
if "__" not in word and not (word.startswith('_') or word.endswith('_'))]
print(nwords)
# ['ad', '12354', 'test']

Matching any character except an underscore using Regex

If I understand what you're asking for - matching strings of characters, except for strings of characters that contain an underscore - this requires regex lookahead.

The reason is that regular expressions normally operate one character at a time. So if I want to know if I should match a character, but only if there is not an underscore later, I need to use lookahead.

^((?!_)[A-Za-z0-9])+$

?! is the negative lookahead operator

EDIT:

So you want there to be at most one underscore in the portion before the @ sign, and no underscore in the portion after?

^[A-Za-z0-9]+_?[A-Za-z0-9]+@[A-Za-z0-9]+\.(com|ca|org|net)$



Related Topics



Leave a reply



Submit