utf-8 word boundary regex in javascript
The word boundary assertion does only match if a word character is not preceded or followed by another word character (so .\b.
is equal to \W\w
and \w\W
). And \w
is defined as [A-Za-z0-9_]
. So \w
doesn’t match greek characters. And thus you cannot use \b
for this case.
What you could do instead is to use this:
"αβ αβγ γαβ αβ αβ".replace(/(^|\s)αβ(?=\s|$)/g, "$1AB")
Javascript RegEx UTF-8
Again: I don't know if this is the answer you are looking for. This will also recapitalize the first letter of the name. So if I'm writing "My name is Salvador Dalí" the answer is: "Hello, Salvador Dalí! Nice to meet you!"
var myInput = document.getElementById("myInput");
function myFunction() { var text, answer = myInput.value.toLowerCase(); answer = answer.replace("my name is ", "");
switch (answer) { case "": text = "Please type something."; break; default: text = "Hello, " + CapitalizeName(answer) + "! Nice to meet you!"; } document.getElementById("reply").innerHTML = text;}
function CapitalizeName(name) { let _array = name.split(" "); let n_array = []; _array.map(w => { w = w.charAt(0).toUpperCase() + w.slice(1); n_array.push(w); }); return n_array.join(" ");}
<p>What is your name?</p>
<input id="myInput" type="text">
<button onclick="myFunction()">Go</button>
<p id="reply"></p>
Javascript - regex - word boundary (\b) issue
Since Javascript doesn't have the lookbehind feature and since word boundaries work only with members of the \w
character class, the only way is to use groups (and capturing groups if you want to make a replacement):
(?m)(^|[^a-zA-ZΆΈ-ώἀ-ῼ\n])([a-zA-ZΆΈ-ώἀ-ῼ]{2})(?![a-zA-ZΆΈ-ώἀ-ῼ])
example to remove 2 letters words:
txt = txt.replace(/(^|[^a-zA-ZΆΈ-ώἀ-ῼ\n])([a-zA-ZΆΈ-ώἀ-ῼ]{2})(?![a-zA-ZΆΈ-ώἀ-ῼ])/gm, '\1');
match hebrew character at word boundary via regex in javascript?
I can't read Hebrew... does this regex do what you want?
/(\S*[\u05D0]+\S*)/g
Your first regex, /(\u05D0+)/g
matches on only the character you are interested in.
Your second regex, /(\u05D0)\b/g
, matches only when the character you are interested in is the last-only (or last-repeated) character before a word boundary...so that doesn't won't match that character in the beginning or middle of a word.
EDIT:
Look at this anwer
utf-8 word boundary regex in javascript
Using the info from that answer, I come up with this regex, is this correct?
/([\u05D0])(?=\s|$)/g
Regex wordwrap with UTF8 characters in JS
The problem is that JavaScript recognizes word boundaries only before/after ASCII letters (and numbers/underscore). Just drop the \b
anchors and it should work.
result = subject.replace(/[a-zA-Z0-9ßÄÖÜäöüÑñÉéÈèÁáÀàÂâŶĈĉĜĝŷÊêÔôÛûŴŵ-]+/g, "<span>$&</span>");
Related Topics
How to Check If String Contains Substring
How to Pass Parameters in Computed Properties in Vue.Js
How to Stop Babel from Transpiling 'This' to 'Undefined' (And Inserting "Use Strict")
Executing JavaScript from Python
Using Ajax to Read Local Files
JavaScript Getelementbyid() Not Working
How to Remove Spaces from a String Using JavaScript
Can You Do Desktop Development Using JavaScript
Higher-Order Functions in JavaScript
Array.Push() Makes All Elements the Same When Pushing an Object
Calling Setstate in a Loop Only Updates State 1 Time
How to Reset <Input Type = "File">
How to Replace Captured Groups Only
Typescript Recursive Function Composition
Automatic Semicolon Insertion & Return Statements