How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?
Actually, everything is typically stored as Unicode of some kind internally, but lets not go into that. I'm assuming you're getting the iconic "åäö" type strings because you're using an ISO-8859 as your character encoding. There's a trick you can do to convert those characters. The escape
and unescape
functions used for encoding and decoding query strings are defined for ISO characters, whereas the newer encodeURIComponent
and decodeURIComponent
which do the same thing, are defined for UTF8 characters.
escape
encodes extended ISO-8859-1 characters (UTF code points U+0080-U+00ff) as %xx
(two-digit hex) whereas it encodes UTF codepoints U+0100 and above as %uxxxx
(%u
followed by four-digit hex.) For example, escape("å") == "%E5"
and escape("あ") == "%u3042"
.
encodeURIComponent
percent-encodes extended characters as a UTF8 byte sequence. For example, encodeURIComponent("å") == "%C3%A5"
and encodeURIComponent("あ") == "%E3%81%82"
.
So you can do:
fixedstring = decodeURIComponent(escape(utfstring));
For example, an incorrectly encoded character "å" becomes "Ã¥". The command does escape("Ã¥") == "%C3%A5"
which is the two incorrect ISO characters encoded as single bytes. Then decodeURIComponent("%C3%A5") == "å"
, where the two percent-encoded bytes are being interpreted as a UTF8 sequence.
If you'd need to do the reverse for some reason, that works too:
utfstring = unescape(encodeURIComponent(originalstring));
Is there a way to differentiate between bad UTF8 strings and ISO strings? Turns out there is. The decodeURIComponent function used above will throw an error if given a malformed encoded sequence. We can use this to detect with a great probability whether our string is UTF8 or ISO.
var fixedstring;
try{
// If the string is UTF-8, this will work and not throw an error.
fixedstring=decodeURIComponent(escape(badstring));
}catch(e){
// If it isn't, an error will be thrown, and we can assume that we have an ISO string.
fixedstring=badstring;
}
How do I transcode a Javascript string to ISO-8859-1?
It is my understanding that Javascript uses UTF-8 for its strings
No, no.
Each page has its charset enconding defined in meta tag, just below head element
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
or
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>
Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.
And it is a good idea to define its target charset encoding on server side.
Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
PHP
header("Content-Type: text/html; charset=UTF-8");
C#
I do not know how to...
And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).
<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>
...
So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem
No, no.
The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.
Java
request.setCharacterEncoding("UTF-8")
PHP
// I do not know how to...
If you really want to translate the target charset encoding, TRY as follows
InternetExplorer
formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
formElement.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";
Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;
alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
if(value == "á")
return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));
Here you can see in action:
You can use this link as guideline (See JavaScript escapes)
Added to original answer how I implement jQuery funcionality
var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
url:"url.htm",
data:dataString,
contentType:"application/x-www-form-urlencoded; charset=UTF-8",
success:function(response) {
// proccess response
});
});
It works fine without any headache.
Regards,
How do I convert between ISO-8859-1 and UTF-8 in Java?
In general, you can't do this. UTF-8 is capable of encoding any Unicode code point. ISO-8859-1 can handle only a tiny fraction of them. So, transcoding from ISO-8859-1 to UTF-8 is no problem. Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found.
To transcode text:
byte[] latin1 = ...
byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");
or
byte[] utf8 = ...
byte[] latin1 = new String(utf8, "UTF-8").getBytes("ISO-8859-1");
You can exercise more control by using the lower-level Charset
APIs. For example, you can raise an exception when an un-encodable character is found, or use a different character for replacement text.
How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?
Actually, everything is typically stored as Unicode of some kind internally, but lets not go into that. I'm assuming you're getting the iconic "åäö" type strings because you're using an ISO-8859 as your character encoding. There's a trick you can do to convert those characters. The escape
and unescape
functions used for encoding and decoding query strings are defined for ISO characters, whereas the newer encodeURIComponent
and decodeURIComponent
which do the same thing, are defined for UTF8 characters.
escape
encodes extended ISO-8859-1 characters (UTF code points U+0080-U+00ff) as %xx
(two-digit hex) whereas it encodes UTF codepoints U+0100 and above as %uxxxx
(%u
followed by four-digit hex.) For example, escape("å") == "%E5"
and escape("あ") == "%u3042"
.
encodeURIComponent
percent-encodes extended characters as a UTF8 byte sequence. For example, encodeURIComponent("å") == "%C3%A5"
and encodeURIComponent("あ") == "%E3%81%82"
.
So you can do:
fixedstring = decodeURIComponent(escape(utfstring));
For example, an incorrectly encoded character "å" becomes "Ã¥". The command does escape("Ã¥") == "%C3%A5"
which is the two incorrect ISO characters encoded as single bytes. Then decodeURIComponent("%C3%A5") == "å"
, where the two percent-encoded bytes are being interpreted as a UTF8 sequence.
If you'd need to do the reverse for some reason, that works too:
utfstring = unescape(encodeURIComponent(originalstring));
Is there a way to differentiate between bad UTF8 strings and ISO strings? Turns out there is. The decodeURIComponent function used above will throw an error if given a malformed encoded sequence. We can use this to detect with a great probability whether our string is UTF8 or ISO.
var fixedstring;
try{
// If the string is UTF-8, this will work and not throw an error.
fixedstring=decodeURIComponent(escape(badstring));
}catch(e){
// If it isn't, an error will be thrown, and we can assume that we have an ISO string.
fixedstring=badstring;
}
Related Topics
How to Include Js.Erb File in View Folder
How to Write a Script to Edit a JSON File
How to Run JavaScript Inside Swift Code
Triggering Onclick Event Using Middle Click
Splitting a Js Array into N Arrays
Passing Variable from JavaScript to Ruby on Rails
Jquery UI - Close Dialog When Clicked Outside
How to Pass an Object into a State Using Ui-Router
JavaScript Loops: For...In VS For
Twitter Bootstrap Rails Button Dropdown No Responding to Ajax
Template Language That Works on Both Server and Client
Bitwise or in Ruby VS JavaScript
Inline Ruby in :JavaScript Haml Tag
How to Use JavaScript Variables in Ruby
How to Export Tables to Excel from a Webpage
Why Does JavaScript's Eval Need Parentheses to Eval JSON Data
How to Improve Performance of Ngrepeat Over a Huge Dataset (Angular.Js)