utf-8 word boundary regex in javascript
The word boundary assertion does only match if a word character is not preceded or followed by another word character (so .\b.
is equal to \W\w
and \w\W
). And \w
is defined as [A-Za-z0-9_]
. So \w
doesn’t match greek characters. And thus you cannot use \b
for this case.
What you could do instead is to use this:
"αβ αβγ γαβ αβ αβ".replace(/(^|\s)αβ(?=\s|$)/g, "$1AB")
RegExp word boundary with special characters (.) javascript
You can check for the word boundary first (as you were doing), but the tricky part is at the end where you can't use the word boundary because of the .
. However, you can check for a whitespace character at the end instead:
/\b(u\.s\.a\.)(?:\s|$)/gi
Check out the Regex101
Javascript regex with word boundary includes word with special characters
\b
only works for ascii, you have to use unicode properties to handle non-ascii word boundaries, for example:
const nodes = [{
textContent: "Ford is the best"
}, {
textContent: "Fordørgen is the best"
}];
const variable = 'Ford';
const regex = new RegExp('(?<!\\p{Alpha})' + variable + '(?!\\p{Alpha})', 'u');
const matches = nodes.filter(function(node) {
return regex.test(node.textContent);
});
console.log(matches);
Regular Expression Word Boundary and Special Characters
\b
is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a
is not preceded by a word character, and there's one after the second d
because it's not followed by a word character. The \b
in your regex (/\b\+/
) is trying to match between the space and the +
, which doesn't work because neither of those is a word character.
Javascript Regex Word Boundary with optional non-word character
You need to account for 3 things here:
- The main point is that a
\b
word boundary is a context-dependent construct, and if your input is not always alphanumeric-only, you need unambiguous word boundaries - You need to double escape special chars inside constructor RegExp notation
- As you pass a variable to a regex, you need to make sure all special chars are properly escaped.
Use
let userStr = 'why hello there, or should I say #hello there?';let keyword = '#hello';let re_pattern = `(?:^|\\W)(${keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')})(?!\\w)`;let res = [], m;
// To find a single (first) matchconsole.log((m=new RegExp(re_pattern).exec(userStr)) ? m[1] : "");
// To find multiple matches:let rx = new RegExp(re_pattern, "g");while (m=rx.exec(userStr)) { res.push(m[1]);}console.log(res);
Related Topics
Positioning Multiple, Random Sized, Absolutely Positioned Elements So They Don't Overlap
How to Use CSSstylesheet.Insertrule() Properly
How to Remove Imported CSS in Reactjs
How to Create Dynamic Elements Inside Global Tabs
Arrange Multiple Divs in CSS/Js
How to Set the Universal CSS Selector with JavaScript
IE8 V8 Not Changing Class for a Dom Element Despite Js Function Changing the Element Attribute
Disabling the Context Menu on Long Taps on Android
Capture Keys Typed on Android Virtual Keyboard Using JavaScript
Detect Double Tap on iPad or iPhone Screen Using JavaScript
What Is the Purpose of the HTML "No-Js" Class
How to Remove an Item from an Array in Angularjs Scope
"Status Code:200 Ok (From Serviceworker)" in Chrome Network Devtools
Html5 Drag & Drop Change Icon/Cursor While Dragging