Convert Utf-8 String Classic Asp to SQL Database

Convert UTF-8 String Classic ASP to SQL Database

Paul's answer isn't wrong but it is not the only part to consider:

You will need to go through each of these steps to make sure that you are getting consistent results;

IMPORTANT: These steps have to be performed on each and every page in your web application or you will have problems (emphasized by Paul's comment).

  1. Each page needs to be saved using UTF-8 encoding double check this as some IDEs will default to Windows-1252 (also often misnamed as "ANSI").

  2. Each page will need the following line added as the very first line in the page, to make this easier I put this along with some other values in an include file so I can include them in each page as I go.

    Include File - page_encoding.asp
    <%@Language="VBScript" CodePage = 65001 %>
    <%
    Response.CharSet = "UTF-8"
    Response.CodePage = 65001
    %>

    Usage in the top of an ASP page (prefer to put in a config folder at the root of the web)

    <!-- #include virtual="/config/page_encoding.asp" -->

    Response.Charset = "UTF-8" is the equivalent of setting the ;charset in the HTTP content-type header.
    Response.CodePage = 65001 tell's ASP to process all dynamic strings as UTF-8.

  3. Include files in the page will also have to be saved using UTF-8 encoding (double check these also).

Follow these steps and your page will work, your problem at the moment is some pages are being interpreted as Windows-1252 while others are being treated as UTF-8 and you're ending up with a mis-match in encoding.

Classic ASP - How to convert a UTF-8 string to UTF-16?

So sick of answering this question, but I feel impelled to as you have made a common assumption that many make when it comes to encoding in ASP, PHP or whatever language you are using.

In web development encoding is intrinsically linked to

The source encoding you use to save the web page

Just looking at the comments under the iconv reference made me laugh and sad at the same time because there are so many people out there who don't understand this topic.

Take for example your PHP snippet

iconv("utf-8","ucs-2be","Мухтарам Мизоч");

This will work providing the following is true

  • The page author saved the file using UTF-8 encoding (Most modern editors have this option in some shape or form).
  • The client Internet Browser knows it should be handling the page as UTF-8 either via a meta tag in the HTML,

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

    or by specifying a HTTP Content-Type Header


In terms of Classic ASP it is the same you need to;

  • Make sure the page is saved as UTF-8 encoding, this includes any #include files that are dependencies.

  • Tell IIS that your pages are UTF-8 by specifying this pre-processing instruction at the very top of the page (must be the first line).

    <%@Language="VBScript" CodePage = 65001 %>
  • Tell the browser what encoding you are using

    <%
    'Tell server to send all strings back to the client as UTF-8
    'while also setting the charset in the HTTP Content Type header.
    Responce.CodePage = 65001
    Response.ContentType = "html/text"
    Response.Charset = "UTF-8"
    %>

UPDATE:

Neither UCS-2 (UTF-16 LE) or UCS-2BE (UTF-16 BE) are supported by Classic ASP, specifying either CodePage (1200 or 1201) will result in;

ASP 0203 - Invalid CodePage Value

After reading a bit about Kannel it does appear as though you can control the character set you send to the SMS gateway, I would recommend you try to send it using UTF-8.

Links

  • Sending arabic SMS in kannel (This question is about sending arabic SMS using Java to Kannel but the information is relevant).

  • Unicode on Windows XP (Although aimed at Windows XP the codepage information is still relevant).

Classic ASP, MySQL or ODBC UTF8 encoding

You have a chance for Slovenian letters according to this mapping and an excerpt from Windows-1252 wiki article:

According to the information on Microsoft's and the Unicode Consortium's websites,
positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API
MultiByteToWideChar maps these to the corresponding C1 control codes.

The euro character at position 80 was not present in earlier versions of this code page,
nor were the S, s, Z, and z with caron (háček).

Here's the things to do:

  1. Use UTF-8 (without BOM) encoded files against the possibility of contain hard-coded text. (✔ already done)

  2. Specify UTF-8 for response charset with ASP on server-side or with meta tags on client-side. (✔ already done)

  3. Tell the MySQL Server your commands are in charset utf-8, and you expect utf-8 encoded result sets. Add an initial statement to the connection string : ...;stmt=SET NAMES 'utf8';...

  4. Set the Response.CodePage to 1252.

I've tested the following script and it works like a charm.

DDL: http://sqlfiddle.com/#!8/c2c35/1

ASP:

<%@Language=VBScript%>
<%
Option Explicit

Response.CodePage = 1252
Response.LCID = 1060
Response.Charset = "utf-8"

Const adCmdText = 1, adVarChar = 200, adParamInput = 1, adLockOptimistic = 3

Dim Connection
Set Connection = Server.CreateObject("Adodb.Connection")
Connection.Open "Driver={MySQL ODBC 3.51 Driver};Server=localhost;Database=myDb;User=myUsr;Password=myPwd;stmt=SET NAMES 'utf8';"

If Request.Form("name").Count = 1 And Len(Request.Form("name")) Then 'add new
Dim rsAdd
Set rsAdd = Server.CreateObject("Adodb.Recordset")
rsAdd.Open "names", Connection, ,adLockOptimistic
rsAdd.AddNew
rsAdd("name").Value = Left(Request.Form("name"), 255)
rsAdd.Update
rsAdd.Close
Set rsAdd = Nothing
End If

Dim Command
Set Command = Server.CreateObject("Adodb.Command")
Command.CommandType = adCmdText
Command.CommandText = "Select name From `names` Order By id Desc"

If Request.QueryString("name").Count = 1 And Len(Request.QueryString("name")) Then
Command.CommandText = "Select name From `names` Where name = ? Order By id Desc"
Command.Parameters.Append Command.CreateParameter(, adVarChar, adParamInput, 255, Left(Request.QueryString("name"), 255))
End If

Set Command.ActiveConnection = Connection
With Command.Execute
While Not .Eof
Response.Write "<a href=""?name=" & .Fields("name").Value & """>" & .Fields("name").Value & "</a><br />"
.MoveNext
Wend
.Close
End With

Set Command.ActiveConnection = Nothing
Set Command = Nothing

Connection.Close
%><hr />
<a href="?">SHOW ALL</a><hr />
<form method="post" action="<%=Request.ServerVariables("SCRIPT_NAME")%>">
Name : <input type="text" name="name" maxlength="255" /> <input type="submit" value="Add" />
</form>

As a last remark:

When you need to apply html encoding to strings fetched from the database, you shouldn't use Server.HTMLEncode anymore due to Response.Codepage is 1252 on server-side and since Server.HTMLEncode is dependent context codepage this will cause gibberish outputs.

So you'll need to write your own html encoder to handle the case.

Function MyOwnHTMLEncode(ByVal str)
str = Replace(str, "&", "&")
str = Replace(str, "<", "<")
str = Replace(str, ">", ">")
str = Replace(str, """", """)
MyOwnHTMLEncode = str
End Function
'Response.Write MyOwnHTMLEncode(rs("myfield").value)

How do I convert UTF-8 data from Classic asp Form post to UCS-2 for inserting into SQL Server 2008 r2?

You have to tell SQL Server 2008 that you are sending in unicode data by adding an N to the front of your insert value. so its like this

strTest = "Служба мгновенных сообщений"
strSQL = "INSERT INTO tblTest (test) VALUES (N'"&strTest&"')"

The N tells SQL server to treat the Contents as Unicode. and does not corrupt the data.

See http://support.microsoft.com/kb/239530 for further info.

Here is test code Run on Classic ASP IIS 7 SQL Server 2008r2

CREATE TABLE [dbo].[tblTest](
[test] [nvarchar](255) NULL,
[id] [int] IDENTITY(1,1) NOT NULL

ASP Page

<%

Response.CodePage = 65001
Response.CharSet = "utf-8"

strTest = Request("Test")

Set cnn = Server.CreateObject("ADODB.Connection")
strConnectionString = Application("DBConnStr")
cnn.Open strConnectionString



strSQL = "INSERT INTO tblTest (test) VALUES (N'"&strTest&"')"
Set rsData = cnn.Execute(strSQL)

%>
<html xmlns="http://www.w3.org/1999/xhtml" charset="utf-8">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>

</head
<body>
<form action="test.asp" method="post" name="form1" >
<br/><br/><br/><center>
<table border="1">
<tr><td><b>Test SQL Write</b> </td></tr>
<tr><td><input type="text" name="Test" style="width: 142px" Value="<%=strtext%>" /></td></tr>
<tr><td><input type="Submit" value="Submit" name "Submit" /></td></tr></table> </center>
</form>


</body>
</html>

Problems with runnig UTF-8 encoded sql files in Classic ASP


It is certain that FileSystemObject does not handle UTF-8 but Unicode and ANSI.

ADODB.Stream can handle a lot of character sets including utf-8 so you can use it instead.

Replace your code up to the first For with the following.

Dim arrSqlLines
With Server.CreateObject("Adodb.Stream")
.Charset = "utf-8"
.Open
.LoadFromFile filePath
If .EOS Then
'an empty array if file is empty
arrSqlLines = Array()
Else
'to obtain an array of lines like you desire
'remove carriage returns (vbCr) if exist
'and split the text by using linefeeds (vbLf) as delimiter
arrSqlLines = Split(Replace(.ReadText, vbCr, ""), vbLf)
End If
.Close
End With


Related Topics



Leave a reply



Submit