Ignoring a Character Along with Word Boundary in Regex

Ignoring a character along with word boundary in regex

All that escaping in the Regexp.new is looking quite ugly. You could greatly simplify that by using a Regexp literal:

word = 'below'
text = "I said, 'look out below'"

reg = /\b#{word}\b/i
text.gsub!(reg, '<b>\0</b>')

Also, you could use the modifier form of gsub! directly, unless that string is aliased in some other place in your code that you are not showing us. Lastly, if you use the single quoted string literal inside your gsub call, you don't need to escape the backslash.

Regular Expression Word Boundary and Special Characters

\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:

add +

...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.

RegEx for word boundary but still match if is preceded or followed by special chars

You are looking for

(?<!\w)(word1|word2)(?!\w)

The (?<!\w) and (?!\w) lookarounds are unambiguous leading ((?<!\w)) and trailing ((?!\w)) word boundaries.

The \b construct meaning depends on the context: \bw will match a w in *w as it will require a non-word character before \b, but \b\* will require a word character before * as * is a non-word character.

In languages that do not support lookbehinds, the (?<!\w) should be replaced with (^|\W) and further manipulations should be done in the code.

Regex : I need to extract all words except the string `ignore`

I'd suggest to swap the logic and actually match the word you'd like to ignore and replace those values leaving the string in the expected output. For example:

(?<=\S)-ignore\b|\bignore-(?=\S)

See an online demo. You can see the results in the bottom of the screen when we replace matched substring with nothing.



  • (?<=\S) - Postive lookbehind to assert position is preceded by a non-whitespace character.
  • -ignore\b - Match '-ignore' followed by a word-boundary.
  • | - Or:
  • \bignore- - Match a word-boundary followed by 'ignore-'.
  • (?=\S) - Positive lookahead to assert position is followed by a non-whitespace character.

Note, if your string can also just be 'ignore' without anything else, you could just add to the alternation to capture that too.

How to make word boundary \b not match on dashes

\b basically denotes a word boundary on characters other than [a-zA-Z0-9_] which includes spaces as well. Surround word with negative lookarounds to ensure there is no non-space character after and before it:

re.compile(r'(?<!\S)word(?!\S)')

Exclude some characters from word boundary in javascript regular expressions

There are several questions in your current question.

Is it possible to make \b not treat - as word boundary?

See this tchrist's answer about word boundaries in the Exploring Boundaries section. That is how it works, and there is no way to redefine \b behavior.

2020-12-22 should match date and not days but it matches both.

To match days and avoid matching dates with days regex, you would need lookbehind and lookahead - /\b(?<!-)\d{2}\b(?!-)/ - but JavaScript regex does not support a lookbehind construct. All you can do is use a consuming pattern instead that will match the start of string or any char but a hyphen - (?:^|[^-]), and use a capturing group around \d{2} to capture it into a separate group. Note that depending on what you are doing you might also need to use a capturing group in the lookbehind workaround pattern.

If you plan to extract, use

var days = /(?:^|[^-])\b(\d{2})\b(?!-)/g;

var s = "25 and 45 on 2017-04-14 and 2017-04-15.";

var res = [], m;

while ((m=days.exec(s)) !== null) {

res.push(m[1]);

}

console.log(res)

preg_match_all not ignoring characters after pattern

The .* capture the opening parenthesis, and then the first parenthesis (after the function name) is captured, because there is the following parenthesis (the one of the array) which correspond to the \( of your pattern.

You should try a more restrictive condition on the function name, such as alphanumeric only or anything but a parenthesis, maybe replace the func_.* by func_[^(]* wich will stop at the first parenthesis match



Related Topics



Leave a reply



Submit