Javascript: Unicode String to Hex

Javascript: Unicode string to hex

Remember that a JavaScript code unit is 16 bits wide. Therefore the hex string form will be 4 digits per code unit.

usage:

var str = "\u6f22\u5b57"; // "\u6f22\u5b57" === "漢字"
alert(str.hexEncode().hexDecode());

String to hex form:

String.prototype.hexEncode = function(){
var hex, i;

var result = "";
for (i=0; i<this.length; i++) {
hex = this.charCodeAt(i).toString(16);
result += ("000"+hex).slice(-4);
}

return result
}

Back again:

String.prototype.hexDecode = function(){
var j;
var hexes = this.match(/.{1,4}/g) || [];
var back = "";
for(j = 0; j<hexes.length; j++) {
back += String.fromCharCode(parseInt(hexes[j], 16));
}

return back;
}

Encode String to HEX

I solved it by downloading utf8.js

https://github.com/mathiasbynens/utf8.js

then using the String2Hex function from above:

alert(String2Hex(utf8.encode('守护村子')));

It gives me the output I want:

e5ae88e68aa4e69d91e5ad90

Convert 16-bit Unicode hex string in Javascript

As String.prototype.substr() is a non-standard method, you should avoid its use like in the following example:

const str = "680065006C006C006F00", strLen = str.length;
let decoded = "";
for (const i = 0; i < strLen; i+=4) {
let c = parseInt(str[i+2] + str[i+3] + str[i] + str[i+1], 16)
decoded += String.fromCodePoint(c);
}

Although I would define some functions to handle it instead:

const decodeUTF16BytePair = (p) => String.fromCodePoint(parseInt(p[2]+p[3]+p[0]+p[1], 16));

const decodeUTF16ByteString = (str) => {
let decoded = "", strLen = str.length;
if (strLen % 4 != 0) {
throw new Error("Unexpected byte string length");
}
for (let i = 0; i < strLen; i+=4) {
decoded += decodeUTF16BytePair(str.slice(i, i+4));
}
return decoded;
}

Because String.fromCodePoint() can coerce it's argument to a number, we could also use:

const decodeUTF16ByteString = (str) => {
let decoded = "", strLen = str.length;
if (strLen % 4 != 0) {
throw new Error("Unexpected byte string length");
}
for (let i = 0; i < strLen; i+=4) {
decoded += String.fromCodePoint("0x"+str[i+2]+str[i+3]+str[i]+str[i+1]);
}
return decoded;
}

Importantly, with the approach as above, you are switching the endianness of the input bytes but some platforms may not need this inversion.

Live Example

const decodeUTF16ByteString = (str) => {
let decoded = "", strLen = str.length;
if (strLen % 4 != 0) {
throw new Error("Unexpected byte string length");
}
for (let i = 0; i < strLen; i+=4) {
decoded += String.fromCodePoint("0x"+str[i+2]+str[i+3]+str[i]+str[i+1]);
}
return decoded;
}

const input = "680065006C006C006F00";
const output = decodeUTF16ByteString(input)

console.log({ input, output });

JavaScript - Encode/Decode UTF8 to Hex and Hex to UTF8

Your utf8toHex is using encodeURIComponent, and this won't make everything HEX.

So I've slightly modified your utf8toHex to handle HEX.

Update
Forgot toString(16) does not pre-zero the hex, so if they was
values less 16, eg. line feeds etc it would fail
So, to added the 0 and sliced to make sure.

Update 2,
Use TextEncoder, this will handle UTF-8 much better than use charCodeAt.

function hexToUtf8(s)
{
return decodeURIComponent(
s.replace(/\s+/g, '') // remove spaces
.replace(/[0-9a-f]{2}/g, '%$&') // add '%' before each 2 characters
);
}

const utf8encoder = new TextEncoder();

function utf8ToHex(s)
{
const rb = utf8encoder.encode(s);
let r = '';
for (const b of rb) {
r += ('0' + b.toString(16)).slice(-2);
}
return r;
}

var hex = "d7a452656c6179204f4e214f706572617465642062792030353232";

var utf8 = hexToUtf8(hex);
var hex2 = utf8ToHex(utf8);

console.log("Hex: " + hex);
console.log("UTF8: " + utf8);
console.log("Hex2: " + hex2);
console.log("Is conversion OK: " + (hex == hex2));

Converting unicode to hex

Note that in your console.log(code, codeHex); you have no space between the two values code and codeHex, so you'll get to see a seemingly big value (1040410).

So separate like this:

console.log(code, ' ', codeHex);

and if you want a nice hex formatting, do this:

console.log(code, ' ', '0x' + ('0000' + codeHex).substr(-4));

Snippet:

var code = "А".charCodeAt(0);var codeHex = code.toString(16).toUpperCase();document.write(code, ' ', '0x' + ('0000' + codeHex).substr(-4));

How to convert decimal to hexadecimal in JavaScript

Convert a number to a hexadecimal string with:

hexString = yourNumber.toString(16);

And reverse the process with:

yourNumber = parseInt(hexString, 16);

convert string or character to hex codes in javascript

As I searched and searched and tested, the only way was to convert charecter by character in a very long if-else-if statement !! It was also printed backward so I had to revert the string first. then numbers and english letters would be reversed so I had to made them reverse again!! It was a headache but worked!

Convert hex value to unicode character

Most emojis require two code units, including that one. fromCharCode works in code units (JavaScript's "characters" are UTF-16 code units except invalid surrogate pairs are tolerated), not code points (actual Unicode characters).

In modern environments, you'd use String.fromCodePoint or just a Unicode codepoint escape sequence (\u{XXXXX} rather than \uXXXX, which is for code units). There's also no need for parseInt:

console.log(String.fromCodePoint(0x1f600));console.log("\u{1f600}");


Related Topics



Leave a reply



Submit