UTF-8 text is garbled when form is posted as multipart/form-data
I had the same problem using Apache commons-fileupload.
I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places:
1. HTML meta tag
2. Form accept-charset attribute
3. Tomcat filter on every request that sets the "UTF-8" encoding
-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:
new String (s.getBytes ("iso-8859-1"), "UTF-8");
hope that helps
Edit: starting with Java 7 you can also use the following:
new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);
upload file with enctype=multipart/form-data got strange file name when the filename is Thai
Have you try
request.setCharacterEncoding("UTF-8");
in the servlet before anything else ?
PHP 5.4 multipart/form-data UTF-8 encoding
I'm writing this to answer my own question... I hope it will help somebody else...
if you use PHP 5.4.x, setting mbstring.http_input from "auto" to "pass" may solve your problem.
Multipart/form-data and UTF-8 in a ASP Classic application
Your analysis of CStrU is correct. It assumes that single byte ANSI characters are being sent by the client. It also assumes that the codepage being used by both client and locale that the VBScript is running in are the same.
When using UTF-8 the assumptions made by CStrU will always be incorrect. There isn't, to my knowledge, a locale that has 65001 as its codepage (I think there are one or two that use 65000 but thats different again).
Here is a replacement function that assumes text is in UTF-8:-
Private Function CStrU(ByRef pstrANSI)
Dim llngLength '' # Length of ANSI string
Dim llngIndex '' # Current position
Dim bytVal
Dim intChar
'' # determine length
llngLength = LenB(pstrANSI)
'' # Loop through each character
llngIndex = 1
Do While llngIndex <= llngLength
bytVal = AscB(MidB(pstrANSI, llngIndex, 1))
llngIndex = llngIndex + 1
If bytVal < &h80 Then
intChar = bytVal
ElseIf bytVal < &hE0 Then
intChar = (bytVal And &h1F) * &h40
bytVal = AscB(MidB(pstrANSI, llngIndex, 1))
llngIndex = llngIndex + 1
intChar = intChar + (bytVal And &h3f)
ElseIf bytVal < &hF0 Then
intChar = (bytVal And &hF) * &h1000
bytVal = AscB(MidB(pstrANSI, llngIndex, 1))
llngIndex = llngIndex + 1
intChar = intChar + (bytVal And &h3F) * &h40
bytVal = AscB(MidB(pstrANSI, llngIndex, 1))
llngIndex = llngIndex + 1
intChar = intChar + (bytVal And &h3F)
Else
intChar = &hBF
End If
CStrU = CStrU & ChrW(intChar)
Loop
End Function
Note that with CStrU being corrected for UTF-8 the output of your example page now looks wrong. The advice to set the Codepage of the file to 65001 is also a requirement. Since you are setting the CharSet sent to the client to "UTF-8" you need to also tell ASP to use the UTF-8 code page when encoding text written using Response.Write.
Related Topics
How to Create Change Listener for Variable
What's in an Eclipse .Classpath/.Project File
How to Set Classpath When I Use Javax.Tools.Javacompiler Compile the Source
Differences Between "Java -Cp" and "Java -Jar"
Does the Sequence of the Values Matter in a JSON Object
Generics:List<? Extends Animal> Is Same as List<Animal>
Apache Httpclient Interim Error: Nohttpresponseexception
How to Identify End of Inputstream in Java
How to Create Splash Screen with Transparent Background in Javafx
Java: How to Access a Class's Field by a Name Stored in a Variable
How Is This Private Variable Accessible
Reserved Words as Names or Identifiers
Java, Shifting Elements in an Array
Java Regex: Repeating Capturing Groups