Utf-8 Text Is Garbled When Form Is Posted as Multipart/Form-Data

UTF-8 text is garbled when form is posted as multipart/form-data

I had the same problem using Apache commons-fileupload.
I did not find out what causes the problems especially because I have the UTF-8 encoding in the following places:
1. HTML meta tag
2. Form accept-charset attribute
3. Tomcat filter on every request that sets the "UTF-8" encoding

-> My solution was to especially convert Strings from ISO-8859-1 (or whatever is the default encoding of your platform) to UTF-8:

new String (s.getBytes ("iso-8859-1"), "UTF-8");

hope that helps

Edit: starting with Java 7 you can also use the following:

new String (s.getBytes (StandardCharsets.ISO_8859_1), StandardCharsets.UTF_8);

upload file with enctype=multipart/form-data got strange file name when the filename is Thai

Have you try

request.setCharacterEncoding("UTF-8");

in the servlet before anything else ?

PHP 5.4 multipart/form-data UTF-8 encoding

I'm writing this to answer my own question... I hope it will help somebody else...

if you use PHP 5.4.x, setting mbstring.http_input from "auto" to "pass" may solve your problem.

Multipart/form-data and UTF-8 in a ASP Classic application

Your analysis of CStrU is correct. It assumes that single byte ANSI characters are being sent by the client. It also assumes that the codepage being used by both client and locale that the VBScript is running in are the same.

When using UTF-8 the assumptions made by CStrU will always be incorrect. There isn't, to my knowledge, a locale that has 65001 as its codepage (I think there are one or two that use 65000 but thats different again).

Here is a replacement function that assumes text is in UTF-8:-

 Private Function CStrU(ByRef pstrANSI)

  Dim llngLength '' # Length of ANSI string
  Dim llngIndex '' # Current position
  Dim bytVal
  Dim intChar

  '' # determine length
  llngLength = LenB(pstrANSI)

  '' # Loop through each character
  llngIndex = 1
  Do While llngIndex <= llngLength

   bytVal = AscB(MidB(pstrANSI, llngIndex, 1))
   llngIndex = llngIndex + 1

   If bytVal < &h80 Then
    intChar = bytVal
   ElseIf bytVal < &hE0 Then

    intChar = (bytVal And &h1F) * &h40

    bytVal =  AscB(MidB(pstrANSI, llngIndex, 1))
    llngIndex = llngIndex + 1

    intChar = intChar + (bytVal And &h3f)

   ElseIf bytVal < &hF0 Then

    intChar = (bytVal And &hF) * &h1000

    bytVal =  AscB(MidB(pstrANSI, llngIndex, 1))
    llngIndex = llngIndex + 1

    intChar = intChar + (bytVal And &h3F) * &h40

    bytVal =  AscB(MidB(pstrANSI, llngIndex, 1))
    llngIndex = llngIndex + 1

    intChar = intChar + (bytVal And &h3F)

   Else
    intChar = &hBF
   End If

   CStrU = CStrU & ChrW(intChar)
  Loop

 End Function

Note that with CStrU being corrected for UTF-8 the output of your example page now looks wrong. The advice to set the Codepage of the file to 65001 is also a requirement. Since you are setting the CharSet sent to the client to "UTF-8" you need to also tell ASP to use the UTF-8 code page when encoding text written using Response.Write.

Utf-8 Text Is Garbled When Form Is Posted as Multipart/Form-Data