Concrete JavaScript regular expression for accented characters (diacritics)
The easier way to accept all accents is this:
[A-zÀ-ú] // accepts lowercase and uppercase characters
[A-zÀ-ÿ] // as above, but including letters with an umlaut (includes [ ] ^ \ × ÷)
[A-Za-zÀ-ÿ] // as above but not including [ ] ^ \
[A-Za-zÀ-ÖØ-öø-ÿ] // as above, but not including [ ] ^ \ × ÷
See Unicode Character Table for characters listed in numeric order.
What's a good regex to include accented characters in a simple way?
Accented Characters: DIY Character Range Subtraction
If your regex engine allows it (and many will), this will work:
(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$
Please see the demo (you can add characters to test).
Explanation
(?i)
sets case-insensitive mode- The
^
anchor asserts that we are at the beginning of the string (?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])
matches one character...- The lookahead
(?![×Þß÷þø])
asserts that the char is not one of those in the brackets [-'0-9a-zÀ-ÿ]
allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract- The
+
matches that one or more times - The
$
anchor asserts that we are at the end of the string
Reference
Extended ASCII Table
Make an accent-insensitive RegExp in JavaScript
There is no RegExp parameter that you can pass to alter the way accents are treated. What you would need to do is build up a matrix of characters that should substitute each other and then construct a RegExp pattern from these substitute characters.
const e = ['È', 'É', 'Ê', 'Ë', 'è', 'é', 'ê', 'ë'], a = ['à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ']
const substitutions = { e, E: e, a, A: a}
var str = 'Cesar';const pattern = Array.from(str).map(c => substitutions[c] ? `[${c}${substitutions[c].join("")}]`: c).join("")console.log(pattern)
var i = new RegExp(pattern, "gi").exec('césar');console.log(i)
Regex for diacritics
As Casimir et Hippolyte stated in comments, Javascript does not support \p{L}
unicode character class.
You can create your own character class:
[a-zA-Z0-9À-ž]
Demo
If you want to allow those characters but replace characters outside those ranges, negate the character classes:
[^a-zA-Z0-9À-ž]
Demo
Or as pointed out in comments:
[A-zÀ-ÖØ-öø-įĴ-őŔ-žǍ-ǰǴ-ǵǸ-țȞ-ȟȤ-ȳɃɆ-ɏḀ-ẞƀ-ƓƗ-ƚƝ-ơƤ-ƥƫ-ưƲ-ƶẠ-ỿ]
Regex matching whitespace and accented characters
You can match by unicode range (for unicode values, take a look at this table). Try something like this:
[a-zA-Z\u00C0-\u017F\s]+
Explanation:
a-zA-Z
matches that range of lower and uppercase characters.\u00C0-\u017F
matches a chunk of accented characters.\s
matches whitespace.
let nameToCheck = "Lómöwen Thrél"let checkValue = /^[a-zA-Z\u00C0-\u017F\s]+$/.test(nameToCheck);
document.write(checkValue ? "valid name" : "invalid name");
how to replace all accented characters with English equivalents
function Convert(string){
return string.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
}
console.log(Convert("Ë À Ì Â Í Ã Î Ä Ï Ç Ò È Ó É Ô Ê Õ Ö ê Ù ë Ú î Û ï Ü ô Ý õ â "))
Output:
"E A I A I A I A I C O E O E O E O O e U e U i U i U o Y o a "
Concrete JavaScript regular expression for accented characters (diacritics)
The easier way to accept all accents is this:
[A-zÀ-ú] // accepts lowercase and uppercase characters
[A-zÀ-ÿ] // as above, but including letters with an umlaut (includes [ ] ^ \ × ÷)
[A-Za-zÀ-ÿ] // as above but not including [ ] ^ \
[A-Za-zÀ-ÖØ-öø-ÿ] // as above, but not including [ ] ^ \ × ÷
See Unicode Character Table for characters listed in numeric order.
Make exception when replacing diacritics with regular characters
The only simple way I see is not optimized but do the job properly :
const text = "Çééé éÇé àç" // test this string
.replace(/\u00e7/g, '__minC__') // save wanted chars position
.replace(/\u00c7/g, '__majC__')
.normalize('NFD') // normalize to prepare diacritic edit
.replace(/\p{Diacritic}/gu, '') // replace all diacritics
.replace(/__minC__/g, 'ç') // restore wanted chars
.replace(/__majC__/g, 'Ç')
console.log(text)
Related Topics
Create a Custom Callback in JavaScript
Differencebetween Parseint() and Number()
How to Get Character Array from a String
Why Shouldn't Jsx Props Use Arrow Functions or Bind
Sort Array of Objects by Single Key with Date Value
Window.Onload VS <Body Onload=""/>
What Are the Differences Between JSON and JavaScript Object
Handling Errors in Promise.All
JavaScript - Href VS Onclick for Callback Function on Hyperlink
Change Url Parameters and Specify Defaults Using JavaScript
What Is the Meaning of "$" Sign in JavaScript
Invariant Violation: Objects Are Not Valid as a React Child
How to Focus on a <Div> Using JavaScript Focus() Function
How to Upload a File with the Js Fetch API
Is the Underscore Prefix for Property and Method Names Merely a Convention