Enable Utf-8 Encoding for JavaScript

How to set charset="utf-8" in the javascript file itself

I found another way, so instead of declaring charset="UTF-8" for the script tag like this:

<script type="text/javascript" charset="UTF-8" src="xyz.js"></script>

I can declare the charset for the web page itself using meta tag, so I can append <meta charset="UTF-8"> to the DOM dynamically, and end up with something like:

<head>
...
<meta charset="UTF-8">
...
</head>

Adding UTF-8 support to JS/PHP script

المراكز is Mojibake, or possibly "double encoding", for المراكز -- Please do SELECT col, hex(col) ... to see which of these looks like:

Mojibake: D8A7D984D985D8B1D8A7D983D8B2

double encoding: C398C2A7C399E2809EC399E280A6C398C2B1C398C2A7C399C692C398C2B2

If Mojibake:

  • The bytes to be stored need to be UTF-8-encoded. Fix this.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
  • HTML should start with <meta charset=UTF-8>.

If double-encoding: This is caused by converting from latin1 (or whatever) to utf8, then treating those bytes as if they were latin1 and repeating the conversion.

More discussion:

Trouble with UTF-8 characters; what I see is not what I stored

Do not use the mysql_* interface in PHP; switch to mysqli_* or PDO interfaces. mysql_* was removed in PHP 5.7.

If <meta charset=“utf-8”> means that JavaScript is using utf-8 encoding instead of utf-16

Charset in meta

The <meta charset=“utf-8”> tag tells HTML (less sloppily: the HTML parser) that the encoding of the page is utf8.

JS does not have a built-in facility to switch between different encondings of strings - it is always utf-16.

Asymptotic bounds

I do not think that there is a O(n) penalty for encoding conversions. Whenever this kind of encoding change is due, there already is an O(n) operation: reading/writing the data stream. So any fixed number of operations on each octet would still be O(n). Encoding change requires local knowledge only, ie. a look-ahead window of fixed length only, and can thus be incorporated in the stream read/write code with a penalty of O(1).

You could argue that the space penalty is O(n), though if there is the need to store the string in any standard encoding (ie. without compression), the move to utf-16 means a factor of 2 at max thus staying within the O(n) bound.

Constant factors

Even if the concern is minimizing the constant factors hidden in O(n) notation encoding change have a modest impact, in the time domain at least. Writing/reading a utf-16 stream as utf-8 for the most part of (Western) textual data means skipping every second octet / inserting null octets. That performance hit pales in comparison with the overhead and the latency stemming from interfacing with a socket or the file system.

Storage is different, of course, though storage is comparatively cheap today and the upper bound of 2 still holds. The move from 32 to 64 bit has a higher memeory impact wrt to number representations and pointers.

How to set charset utf-8 for jquery i18n?

UTF-8 should be the default for any library nowadays and jquery.i18n, while not bleeding edge, appears to be a serioius project. Being precise, the JavaScript interpreter will automatically parse source code encoded as UTF-8 and convert strings to its internal encoding. I bet the library works with any encoding out of the box as long as it's properly declared (e.g. by sending a correct Content-Type header).

The character in Auspr�gung is a typical symptom of single-byte encoding misinterpreted as UTF-8. Your editor is probably configured to save files as ANSI (Windows-1252 or whatever) but the stack is configured to assume UTF-8.

Utf-8 text in javascript not showing properly when served by Glassfish

The issue was solved by adding a filter to my web.xml

public class CharsetFilter implements Filter
{
public static final String DESIRED_ENCODING="UTF-8";
private String encoding;

@Override
public void init(FilterConfig config) throws ServletException
{
encoding = config.getInitParameter("appEncoding");
if( encoding==null ) encoding=DESIRED_ENCODING;
}

@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next)
throws IOException, ServletException
{
//HttpServletRequest rq=(HttpServletRequest)request;
//String uri=rq.getRequestURI().toLowerCase();
//System.out.println(""+uri);
//this is used to fix static javascript encoding
if (!encoding.equals(response.getCharacterEncoding())){
response.setCharacterEncoding(encoding);
}
if (!encoding.equals(request.getCharacterEncoding())){
request.setCharacterEncoding(encoding);
}

next.doFilter(request, response);
}

public void destroy(){}
}

and in web.xml:

        <filter>
<filter-name>CharsetFilter</filter-name>
<filter-class>utilities.CharsetFilter</filter-class>
<init-param>
<param-name>appEncoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>CharsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>


Related Topics



Leave a reply



Submit