How to Correctly Decode Unicode Parameters Passed to a Servlet

How do I correctly decode unicode parameters passed to a servlet

You are nearly there. EncodeURIComponent correctly encodes to UTF-8, which is what you should always use in a URL today.

The problem is that the submitted query string is getting mutilated on the way into your server-side script, because getParameter() uses ISO-8559-1 instead of UTF-8. This stems from Ancient Times before the web settled on UTF-8 for URI/IRI, but it's rather pathetic that the Servlet spec hasn't been updated to match reality, or at least provide a reliable, supported option for it.

(There is request.setCharacterEncoding in Servlet 2.3, but it doesn't affect query string parsing, and if a single parameter has been read before, possibly by some other framework element, it won't work at all.)

So you need to futz around with container-specific methods to get proper UTF-8, often involving stuff in server.xml. This totally sucks for distributing web apps that should work anywhere. For Tomcat see https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding and also What's the difference between "URIEncoding" of Tomcat, Encoding Filter and request.setCharacterEncoding.

How to process encoded unicode text in servlet?

I am trying to get it via getParameter() method.

getParameter and handling of input encodings in Servlet is broken in general. You get ISO-8559-1 whether you want it or not (and you generally don't).

You can work around this and get UTF-8 for query string parameters by:

  1. Container-specific configuration options (eg Tomcat URIEncoding).

  2. Grabbing the raw request.getQueryString() and passing its pieces into URLDecoder.decode(..., "utf-8") manually instead of relying on getParameter. Only if you are taking this route do you need to worry about URLDecoder yourself.

  3. Fixing up the mis-decoding of the getParameter output by encoding the bad value back to the original bytes it came from (using ISO-8859-1) and then decoding it as UTF-8, eg new String(request.getParameter("param").getBytes("iso-8859-1"), "utf-8").

See this question for background.

request.getParameter() does not display properly character encoding in java servlet

Most servers, including Apache Tomcat server, are configured to parameter encoding with ISO-8859-1 by default. I think you won't change this unless you have a private dedicated server instance. So, the programmer's technique is to encode/decode those parameters manually. Because you are using javascript, there's encodeURI() or encodeURIComponent() built-in function. See How to encode a URL in JavaScript. The code should change

x_URL += "&name="+encodeURI(document.getElementById("txt_name").value);

in the Java use the URLDecoder to decode parameter back.

java.net.URLDecoder.decode(((String[])request.getParameterMap().get("name"))[0], "UTF-8"));

Note, if you are using Struts2 dispatcher result type then you don't need to decode parameters in the query string. Those parameters are parsed via UrlHelper.

However, I don't remember when I decode those parameters are automatically decoded in Struts2.

As a rule you should know that if you pass parameters in the URL they should be URL encoded. If you submit the form there's no need to do it because the form is x-www-form-urlencoded, see 17.13.4 Form content types.

HttpServletRequest UTF-8 Encoding

Paul's suggestion seems like the best course of action, but if you're going to work around it, you don't need URLEncoder or URLDecoder at all:

String item = request.getParameter("param"); 

byte[] bytes = item.getBytes(StandardCharsets.ISO_8859_1);
item = new String(bytes, StandardCharsets.UTF_8);

// Java 6:
// byte[] bytes = item.getBytes("ISO-8859-1");
// item = new String(bytes, "UTF-8");

Update: Since this is getting a lot of votes, I want to stress BalusC's point that this definitely is not a solution; it is a workaround at best. People should not be doing this.

I don't know exactly what caused the original issue, but I suspect the URL was already UTF-8 encoded, and then was UTF-8 encoded again.

HTML passes parameters to Java servlet incorrectly

You can use ENCODE HTML before send the request and DECODE HTML when you recive

Java servlet sendRequest - getParameter encoding Problem

The HttpServletResponse#encodeRedirectURL() does not URL-encode the URL. It only appends the jsessionid attribute to the URL whenever there's a session and the client has cookies disabled. Admittedly, it's a confusing method name.

You need to encode the request parameters with help of URLEncoder#encode() yourself during composing the URL.

String charset = "UTF-8";
String link = String.format("index.jsp?name=%s&title=%s",
URLEncoder.encode(metadata.getName(), charset),
URLEncoder.encode(metadata.getTitle(), charset));

response.sendRedirect(response.encodeRedirectURL(link));

And create a filter which is mapped on /* and does basically the following in doFilter() method:

request.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);

And add the following to top of your JSP:

<%@ page pageEncoding="UTF-8" %>

Finally you'll be able to display them as follows:

<p>Name: ${param.name}</p>
<p>Title: ${param.title}</p>

See also:

  • Unicode - How to get characters right?

Passing request parameters as UTF-8 encoded strings

The pageEncoding only sets the response character encoding and the charset attribute of the HTTP Content-Type header. Basically, it tells the server to decode the characters produced by JSP as UTF-8 before sending it to the client and the header tells the client to encode them using UTF-8 and also to use it when any forms in the very same page is to be submitted back to the server. The contentType already defaults to text/html, so below is sufficient:

<%@page pageEncoding="UTF-8"%>

The HTML meta tag is ignored when the page is served over HTTP. It's only been used when the page is by the client saved as a HTML file on local disk system and then opened by a file:// URI in browser.

In your particular case, the HTTP request body encoding is apparently not been set to UTF-8. The request body encoding needs to be set by ServletRequest#setCharacterEncoding() in the servlet or a filter before the first call on request.getXxx() is ever made in any servlet or filter involved in the request.

request.setCharacterEncoding("UTF-8");
String login = request.getParameter("login");
String password = request.getParameter("password");
// ...

See also:

  • How to set request encoding in Tomcat?
  • Why does POST not honor charset, but an AJAX request does? tomcat 6
  • https://stackoverflow.com/questions/14177914/passing-turkish-char-from-form-to-java-class-with-struts2/
  • Unicode - How to get the characters right?

request.getQueryString() seems to need some encoding

I've run into this same problem before. Not sure what Java servlet container you're using, but at least in Tomcat 5.x (not sure about 6.x) the request.setCharacterEncoding() method doesn't really have an effect on GET parameters. By the time your servlet runs, GET parameters have already been decoded by Tomcat, so setCharacterEncoding won't do anything.

Two ways to get around this:

  1. Change the URIEncoding setting for your connector to UTF-8. See http://tomcat.apache.org/tomcat-5.5-doc/config/http.html.

  2. As BalusC suggests, decode the query string yourself, and manually parse it (as opposed to using the ServletRequest APIs) into a parameter map yourself.

Hope this helps!



Related Topics



Leave a reply



Submit