Remove accents/diacritics in a string in JavaScript
With ES2015/ES6 String.prototype.normalize(),
const str = "Crème Brulée"
str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
> "Creme Brulee"
Note: use NFKD
if you want things like \uFB01
(fi
) normalized (to fi
).
Two things are happening here:
normalize()
ing toNFD
Unicode normal form decomposes combined graphemes into the combination of simple ones. Theè
ofCrème
ends up expressed ase
+̀
.- Using a regex character class to match the U+0300 → U+036F range, it is now trivial to globally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.
As of 2021, one can also use Unicode property escapes:
str.normalize("NFD").replace(/\p{Diacritic}/gu, "")
See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.
const c = new Intl.Collator();
["creme brulee", "crème brulée", "crame brulai", "crome brouillé",
"creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
["crame brulai", "creme brulay", "creme bruléa", "creme brulee",
"crème brulée", "creme brulfé", "crome brouillé"]
["creme brulee", "crème brulée", "crame brulai", "crome brouillé"].sort((a,b) => a>b)
["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]
Remove accents/diacritics in a string in Angular Probleme with IE 11
You got the error in IE 11 because IE doesn't support normalize
.
You need to use polyfill to support normalize
in IE 11. You can use module unorm to provide support for normalize
in IE 11.
- Run
npm install unorm
. - Use
import 'unorm';
at the file where you write thenormalize
function.
how to replace all accented characters with English equivalents
function Convert(string){
return string.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
}
console.log(Convert("Ë À Ì Â Í Ã Î Ä Ï Ç Ò È Ó É Ô Ê Õ Ö ê Ù ë Ú î Û ï Ü ô Ý õ â "))
Output:
"E A I A I A I A I C O E O E O E O O e U e U i U i U o Y o a "
Remove accents/diacritics in a string in JavaScript
With ES2015/ES6 String.prototype.normalize(),
const str = "Crème Brulée"
str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
> "Creme Brulee"
Note: use NFKD
if you want things like \uFB01
(fi
) normalized (to fi
).
Two things are happening here:
normalize()
ing toNFD
Unicode normal form decomposes combined graphemes into the combination of simple ones. Theè
ofCrème
ends up expressed ase
+̀
.- Using a regex character class to match the U+0300 → U+036F range, it is now trivial to globally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.
As of 2021, one can also use Unicode property escapes:
str.normalize("NFD").replace(/\p{Diacritic}/gu, "")
See comment for performance testing.
Alternatively, if you just want sorting
Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.
const c = new Intl.Collator();
["creme brulee", "crème brulée", "crame brulai", "crome brouillé",
"creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
["crame brulai", "creme brulay", "creme bruléa", "creme brulee",
"crème brulée", "creme brulfé", "crome brouillé"]
["creme brulee", "crème brulée", "crame brulai", "crome brouillé"].sort((a,b) => a>b)
["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]
Efficiently replace all accented characters in a string?
I can't speak to what you are trying to do specifically with the function itself, but if you don't like the regex being built every time, here are two solutions and some caveats about each.
Here is one way to do this:
function makeSortString(s) {
if(!makeSortString.translate_re) makeSortString.translate_re = /[öäüÖÄÜ]/g;
var translate = {
"ä": "a", "ö": "o", "ü": "u",
"Ä": "A", "Ö": "O", "Ü": "U" // probably more to come
};
return ( s.replace(makeSortString.translate_re, function(match) {
return translate[match];
}) );
}
This will obviously make the regex a property of the function itself. The only thing you may not like about this (or you may, I guess it depends) is that the regex can now be modified outside of the function's body. So, someone could do this to modify the interally-used regex:
makeSortString.translate_re = /[a-z]/g;
So, there is that option.
One way to get a closure, and thus prevent someone from modifying the regex, would be to define this as an anonymous function assignment like this:
var makeSortString = (function() {
var translate_re = /[öäüÖÄÜ]/g;
return function(s) {
var translate = {
"ä": "a", "ö": "o", "ü": "u",
"Ä": "A", "Ö": "O", "Ü": "U" // probably more to come
};
return ( s.replace(translate_re, function(match) {
return translate[match];
}) );
}
})();
Hopefully this is useful to you.
UPDATE: It's early and I don't know why I didn't see the obvious before, but it might also be useful to put you translate
object in a closure as well:
var makeSortString = (function() {
var translate_re = /[öäüÖÄÜ]/g;
var translate = {
"ä": "a", "ö": "o", "ü": "u",
"Ä": "A", "Ö": "O", "Ü": "U" // probably more to come
};
return function(s) {
return ( s.replace(translate_re, function(match) {
return translate[match];
}) );
}
})();
BigQuery UDF to remove accents/diacritics in a string
Consider below approach
select originalText,
regexp_replace(normalize(originalText, NFD), r"\pM", '') output
if applied to sample data in your question - output is
You can easily wrap it with SQL UDF if you wish
Replacing diacritics in Javascript
If you want to do it entirely on the client side, I think your only option is with some kind of lookup table. Here's a starting point, written by a chap called Olavi Ivask on his blog...
function replaceDiacritics(s)
{
var s;
var diacritics =[
/[\300-\306]/g, /[\340-\346]/g, // A, a
/[\310-\313]/g, /[\350-\353]/g, // E, e
/[\314-\317]/g, /[\354-\357]/g, // I, i
/[\322-\330]/g, /[\362-\370]/g, // O, o
/[\331-\334]/g, /[\371-\374]/g, // U, u
/[\321]/g, /[\361]/g, // N, n
/[\307]/g, /[\347]/g, // C, c
];
var chars = ['A','a','E','e','I','i','O','o','U','u','N','n','C','c'];
for (var i = 0; i < diacritics.length; i++)
{
s = s.replace(diacritics[i],chars[i]);
}
document.write(s);
}
You can see this is simply an array of regexes for known diacritic chars, mapping them back onto a "plain" character.
How do I remove diacritics (accents) from a string in .NET?
I've not used this method, but Michael Kaplan describes a method for doing so in his blog post (with a confusing title) that talks about stripping diacritics: Stripping is an interesting job (aka
On the meaning of meaningless, aka All
Mn characters are non-spacing, but
some are more non-spacing than
others)
static string RemoveDiacritics(string text)
{
var normalizedString = text.Normalize(NormalizationForm.FormD);
var stringBuilder = new StringBuilder(capacity: normalizedString.Length);
for (int i = 0; i < normalizedString.Length; i++)
{
char c = normalizedString[i];
var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
return stringBuilder
.ToString()
.Normalize(NormalizationForm.FormC);
}
Note that this is a followup to his earlier post: Stripping diacritics....
The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the "base" characters from the diacritics) and then scans the result and retains only the base characters. It's just a little complicated, but really you're looking at a complicated problem.
Of course, if you're limiting yourself to French, you could probably get away with the simple table-based approach in How to remove accents and tilde in a C++ std::string, as recommended by @David Dibben.
Related Topics
How to Check If Element Is Visible After Scrolling
Why Is Settimeout(Fn, 0) Sometimes Useful
How to Convert an Image into Base64 String Using JavaScript
When Is JavaScript'S Eval() Not Evil
How to Select Text Nodes With Jquery
Looping Through Array and Removing Items, Without Breaking For Loop
Deleting Array Elements in JavaScript - Delete VS Splice
How to Test For an Empty JavaScript Object
How to Get the Caret Column (Not Pixels) Position in a Textarea, in Characters, from the Start
How to Get All Properties Values of a JavaScript Object (Without Knowing the Keys)
Pass Correct "This" Context to Settimeout Callback
How to Efficiently Count the Number of Keys/Properties of an Object in JavaScript
How to Write Denormalized Data in Firebase
Jquery Selector Regular Expressions
What Do Parentheses Surrounding an Object/Function/Class Declaration Mean
Browser Detection in JavaScript
How to Find Event Listeners on a Dom Node in JavaScript or in Debugging