Converting Text to Unicode in JavaScript

Converting text to Unicode in javascript

JavaScript uses UCS-2 internally.

This means that supplementary Unicode symbols are exposed as two separate code units (the surrogate halves). For example, ''.length == 2, even though it’s only one Unicode character.

Because of this, if you want to get the Unicode code point for every character in a string, you’ll need to convert the UCS-2 string into an array of UTF-16 code points (where each surrogate pair forms a single code point). You could use Punycode.js’s utility functions for this:

punycode.ucs2.decode('abc'); // [97, 98, 99]
punycode.ucs2.decode(''); // [119558]

How can I convert a string into a unicode character?

Use String.fromCharCode() like this: String.fromCharCode(parseInt(input,16)). When you put a Unicode value in a string using \u, it is interpreted as a hexdecimal value, so you need to specify the base (16) when using parseInt.

How can I convert string to unicode?

.split("") — separate string into array from each character // ["こ", "の", "O" ...]

• loop into array with map() and replace each character with charCode, converted to hex string.

let str = "このOTPを使用してQuikドライブにログインします。 このOTPを誰とも共有しないでください";
str = str.split("").map( char => addZeros( char.charCodeAt(0).toString(16) ) ).join("");
function addZeros(str){ return ("0000" + str).slice(-4) }

console.log( str );
// for comparisonconsole.log( "3053306e004f0054005030924f7f752830573066005100750069006b30c930e930a430d6306b30ed30b030a430f33057307e3059300200203053306e004f0054005030928ab030683082517167093057306a30443067304f306030553044" )

Converting unicode character to string format

Just found a way:
String.fromCharCode(parseInt(unicode,16)) returns the right symbol representation. The unicode here doesn't have the \u in front of it just the number.

Converting unicode characters through javascript

input = '\xe8\xac\x9b\xe5\x91\xa2D\xe3\x80\x82'
console.log(decodeURIComponent(escape(input)))

This gives you exactly

講呢D。

UPDATE

If your string really contains \x characters, then we can convert them to %s first:

input = '\\xe8\\xac\\x9b\\xe5\\x91\\xa2D\\xe3\\x80\\x82'

decodeURIComponent(input.replace(/\\x/g, '%'))

How to convert unicode in JavaScript?

Those are Unicode character escape sequences in a JavaScript string. As far as JavaScript is concerned, they are the same character.

'\u003cb\u003eleft\u003c/b\u003e' == '<b>left</b>'; // true

So, you don’t need to do any conversion at all.

JavaScript function to convert unicode pseduo-alphabet to regular characters?

Following the suggestion from this answer, this solution uses the unicode-12.1.0 NPM package:

const unicodeNames = require('unicode-12.1.0/Names');

const overrides = Object.freeze({
'ん': 'h',
'乇': 'E',
'レ': 'l',
'尺': 'r',
// ...
});

const toRegularCharacters = xs => {
if (typeof xs !== 'string') {
throw new TypeError('xs must be a string');
}

return [ ...xs ].map(x => {
const override = overrides[x];

if (override) {
return override;
}

const names = unicodeNames
.get(x.codePointAt(0))
.split(/\s+/);

// console.log({
// x,
// names,
// });

const isCapital = names.some(x => x == 'CAPITAL');

const isLetter = isCapital || names.some(x => x == 'SMALL');

if (isLetter) {
// e.g. "Ŧ" is named "LATIN CAPITAL LETTER T WITH STROKE"
const c = names.some(x => x == 'WITH') ?
names[names.length - 3] :
names[names.length - 1];

return isCapital ?
c :
c.toLowerCase();
}

return x;
}).join('');
};

console.log(
toRegularCharacters('..')
);

console.log(
toRegularCharacters('-')
);

console.log(
toRegularCharacters('ん乇レレo, wo尺レd')
);

console.log(
toRegularCharacters('ŦɆSŦƗNǤ')
);

The Names data-table contains the required information, but not in the best form, so there is some hacky string manipulation to get the character out.

A map of overrides is used for cases such as '尺'.

A better solution would extract the idn_mapping property as mentioned by @Seth.



Related Topics



Leave a reply



Submit