Converting Between Strings and Arraybuffers

Converting between strings and ArrayBuffers

Update 2016 - five years on there are now new methods in the specs (see support below) to convert between strings and typed arrays using proper encoding.

TextEncoder

The TextEncoder represents:

The TextEncoder interface represents an encoder for a specific method,
that is a specific character encoding, like utf-8, iso-8859-2, koi8,
cp1261, gbk, ...
An encoder takes a stream of code points as input and
emits a stream of bytes.

Change note since the above was written: (ibid.)

Note: Firefox, Chrome and Opera used to have support for encoding
types other than utf-8 (such as utf-16, iso-8859-2, koi8, cp1261, and
gbk). As of Firefox 48 [...], Chrome 54 [...] and Opera 41, no
other encoding types are available other than utf-8, in order to match
the spec.*

*) Updated specs (W3) and here (whatwg).

After creating an instance of the TextEncoder it will take a string and encode it using a given encoding parameter:

if (!("TextEncoder" in window)) 

alert("Sorry, this browser does not support TextEncoder...");

var enc = new TextEncoder(); // always utf-8

console.log(enc.encode("This is a string converted to a Uint8Array"));

JavaScript- convert array buffer to string

is this enough?

function stringToArrayBuffer(str){
if(/[\u0080-\uffff]/.test(str)){
throw new Error("this needs encoding, like UTF-8");
}
var arr = new Uint8Array(str.length);
for(var i=str.length; i--; )
arr[i] = str.charCodeAt(i);
return arr.buffer;
}

function arrayBufferToString(buffer){
var arr = new Uint8Array(buffer);
var str = String.fromCharCode.apply(String, arr);
if(/[\u0080-\uffff]/.test(str)){
throw new Error("this string seems to contain (still encoded) multibytes");
}
return str;
}

or do you need real UTF-8 encoding

Edit: full UTF-8 support

Beware/Disclaimer: this code is not tested against some foreign implementaion of an UTF-8 encoder or decoder. It may produce wrong results.

TEST IT YOURSELF, before you use it in production!

function stringToArrayBuffer(str){
if(/[\u0080-\uffff]/.test(str)){
var arr = new Array(str.length);
for(var i=0, j=0, len=str.length; i<len; ++i){
var cc = str.charCodeAt(i);
if(cc < 128){
//single byte
arr[j++] = cc;
}else{
//UTF-8 multibyte
if(cc < 2048){
arr[j++] = (cc >> 6) | 192;
}else{
arr[j++] = (cc >> 12) | 224;
arr[j++] = ((cc >> 6) & 63) | 128;
}
arr[j++] = (cc & 63) | 128;
}
}
var byteArray = new Uint8Array(arr);
}else{
var byteArray = new Uint8Array(str.length);
for(var i = str.length; i--; )
byteArray[i] = str.charCodeAt(i);
}
return byteArray.buffer;
}

function arrayBufferToString(buffer){
var byteArray = new Uint8Array(buffer);
var str = "", cc = 0, numBytes = 0;
for(var i=0, len = byteArray.length; i<len; ++i){
var v = byteArray[i];
if(numBytes > 0){
//2 bit determining that this is a tailing byte + 6 bit of payload
if((cc&192) === 192){
//processing tailing-bytes
cc = (cc << 6) | (v & 63);
}else{
throw new Error("this is no tailing-byte");
}
}else if(v < 128){
//single-byte
numBytes = 1;
cc = v;
}else if(v < 192){
//these are tailing-bytes
throw new Error("invalid byte, this is a tailing-byte")
}else if(v < 224){
//3 bits of header + 5bits of payload
numBytes = 2;
cc = v & 31;
}else if(v < 240){
//4 bits of header + 4bit of payload
numBytes = 3;
cc = v & 15;
}else{
//UTF-8 theoretically supports up to 8 bytes containing up to 42bit of payload
//but JS can only handle 16bit.
throw new Error("invalid encoding, value out of range")
}

if(--numBytes === 0){
str += String.fromCharCode(cc);
}
}
if(numBytes){
throw new Error("the bytes don't sum up");
}
return str;
}

Conversion between UTF-8 ArrayBuffer and String

function stringToUint(string) {
var string = btoa(unescape(encodeURIComponent(string))),
charList = string.split(''),
uintArray = [];
for (var i = 0; i < charList.length; i++) {
uintArray.push(charList[i].charCodeAt(0));
}
return new Uint8Array(uintArray);
}

function uintToString(uintArray) {
var encodedString = String.fromCharCode.apply(null, uintArray),
decodedString = decodeURIComponent(escape(atob(encodedString)));
return decodedString;
}

I have done, with some help from the internet, these little functions, they should solve your problems! Here is the working JSFiddle.

EDIT:

Since the source of the Uint8Array is external and you can't use atob you just need to remove it(working fiddle):

function uintToString(uintArray) {
var encodedString = String.fromCharCode.apply(null, uintArray),
decodedString = decodeURIComponent(escape(encodedString));
return decodedString;
}

Warning: escape and unescape is removed from web standards. See this.

Converting ArrayBuffer to String then back to ArrayBuffer using TextDecoder/TextEncoder returning a different result

TextDecoder and TextEncoder are designed to work with text.
To convert an arbitrary byte sequence into a string and back, it's best to treat each byte as a single character.

var arrayBuff = Memory.readByteArray(pointer,2000); //Get a 2,000 byte ArrayBuffer

console.log(arrayBuff.byteLength); //Always returns 2,000

//Decode and encode same data without making any changes
var decoded = String.fromCharCode(...new Uint8Array(arrayBuff));
var encoded = Uint8Array.from([...decoded].map(ch => ch.charCodeAt())).buffer;

console.log(encoded.byteLength);

The decoded string will have exactly the same length as the input buffer and can be easily manipulated with regular expression, string methods, etc. But beware that Unicode characters that occupy two or more bytes in memory (e.g. "π") won't be recognizable anymore, as they will result in the concatenation of the characters corresponding to the code of each individual byte.

How to get a file or blob from an object URL?

Modern solution:

let blob = await fetch(url).then(r => r.blob());

The url can be an object url or a normal url.



Related Topics



Leave a reply



Submit