How to Change Gb-2312 Encoding to Utf-8

Convert GB2312 to UTF-8

You can try this online service that uses the Open Source iconv utility.

You can also install Charco, a command-line version of it on your machine.

For GB2312, you can use CP936 as the encoding.

If you are a .Net developer you can make a small tool that does just that.

I've struggled with this as well and found that it was actually simple to solve from a programmatic point of view.

All you need is something like this (I tested it and it works):

In C#

static void Main(string[] args) {
string infile = args[0];
string outfile = args[1];

using (StreamReader sr = new StreamReader(infile, Encoding.GetEncoding(936))) {
using (StreamWriter sw = new StreamWriter(outfile, false, Encoding.UTF8)) {
sw.Write(sr.ReadToEnd());
sw.Close();
}
sr.Close();
}
}

In VB.Net

Private Shared Sub Main(ByVal args() As String)
Dim infile As String = args(0)
Dim outfile As String = args(1)
Dim sr As StreamReader = New StreamReader(infile, Encoding.GetEncoding(936))
Dim sw As StreamWriter = New StreamWriter(outfile, false, Encoding.UTF8)
sw.Write(sr.ReadToEnd)
sw.Close
sr.Close
End Sub

How to change GB-2312 encoding to UTF-8

I solved this issue with using the concrete value of gb312 constant instead of the apple defined constant

let enc = CFStringConvertEncodingToNSStringEncoding(0x0632);     
let dogString:String = NSString(data: data, encoding: enc)!
println(dogString)

here is the better solution - and thanks for Daij-Djan's suggestion

let cfEnc = CFStringEncodings.GB_18030_2000
let enc = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(cfEnc.rawValue))
let dogString:String = NSString(data: data, encoding: enc)!

How to convert UTF-8 interpreted GB2312 encoding to real UTF-8 encoding?

It seems that the individual bytes that make up the characters have been encoded as HTML numeric entities as if they were characters from ISO-8859-1 or some other 8-bit encoding. To undo the numeric entity encoding you can use mb_decode_numericentity:

$str = "ÄÉ´ï¶û¾ø¾³Ï´󷴻÷¾Ü¾øÀäÃÅÄæת½ú¼¶ÖÐÍøËÄÇ¿";

$str = mb_decode_numericentity($str, array(0, 255, 0, 255), "ISO-8859-1");

echo iconv("gb2312", "utf8", $str);

This produces the expected output of 纳达尔绝境下大反击拒绝冷门逆转晋级中网四强.

Converting utf8 to gb2312 in java

what you are looking for is the encoding/decoding when you output/input.

as @kalpesh said, internally, it is all unicode. if you want to READ a stream in a specific encoding and then WRITE it to a different one, you will have to specify the encoding for the conversion between bytes (in the stream) and strings (in java), and then between strings (in java) to bytes (the output stream) like so:

        InputStream is = new FileInputStream("utf8_encoded_text.txt");
OutputStream os = new FileOutputStream("gb2312_encoded.txt");

Reader r = new InputStreamReader(is,"utf-8");
BufferedReader br = new BufferedReader(r);
Writer w = new OutputStreamWriter(os, "gb2312");
BufferedWriter bw = new BufferedWriter(w);

String s=null;
while((s=br.readLine())!=null) {
bw.write(s);
}
br.close();
bw.close();
os.flush();

of course, you still have to do proper exception handling to make sure everything is properly closed.

Converting Simplifed Chinese GB 2312 text characters into UTF8

On unix systems you'd best use the iconv library.

See iconv_open, iconv, iconv_close

You'd have to know the character encoding of course (EUC-CN, HZ).

If not on a unix system, search for some support in the OS, doing character conversions by hand is very hard to get right.

Error with chinese encoding with php

Chinese character encoding is usually gb2312.

try to gb2312 convert to utf-8

$str = iconv('gb2312', 'utf-8', $str);

make sure your file is utf-8 encoding.

Content-type: text/html; charset=utf-8



Related Topics



Leave a reply



Submit