String Length in Bytes in JavaScript

String length in bytes in JavaScript

There is no way to do it in JavaScript natively. (See Riccardo Galli's answer for a modern approach.)


For historical reference or where TextEncoder APIs are still unavailable.

If you know the character encoding, you can calculate it yourself though.

encodeURIComponent assumes UTF-8 as the character encoding, so if you need that encoding, you can do,

function lengthInUtf8Bytes(str) {
// Matches only the 10.. bytes that are non-initial characters in a multi-byte sequence.
var m = encodeURIComponent(str).match(/%[89ABab]/g);
return str.length + (m ? m.length : 0);
}

This should work because of the way UTF-8 encodes multi-byte sequences. The first encoded byte always starts with either a high bit of zero for a single byte sequence, or a byte whose first hex digit is C, D, E, or F. The second and subsequent bytes are the ones whose first two bits are 10. Those are the extra bytes you want to count in UTF-8.

The table in wikipedia makes it clearer

Bits        Last code point Byte 1          Byte 2          Byte 3
7 U+007F 0xxxxxxx
11 U+07FF 110xxxxx 10xxxxxx
16 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
...

If instead you need to understand the page encoding, you can use this trick:

function lengthInPageEncoding(s) {
var a = document.createElement('A');
a.href = '#' + s;
var sEncoded = a.href;
sEncoded = sEncoded.substring(sEncoded.indexOf('#') + 1);
var m = sEncoded.match(/%[0-9a-f]{2}/g);
return sEncoded.length - (m ? m.length * 2 : 0);
}

How many bytes in a JavaScript string?

You can use the Blob to get the string size in bytes.

Examples:

console.info(  new Blob(['']).size,                             // 4  new Blob(['']).size,                             // 4  new Blob(['']).size,                           // 8  new Blob(['']).size,                           // 8  new Blob(['I\'m a string']).size,                  // 12
// from Premasagar correction of Lauri's answer for // strings containing lone characters in the surrogate pair range: // https://stackoverflow.com/a/39488643/6225838 new Blob([String.fromCharCode(55555)]).size, // 3 new Blob([String.fromCharCode(55555, 57000)]).size // 4 (not 6));

How to get the string length in bytes in nodejs?

Here is an example:

str = 'äáöü';

console.log(str + ": " + str.length + " characters, " +
Buffer.byteLength(str, 'utf8') + " bytes");

// äáöü: 4 characters, 8 bytes

Buffer.byteLength(string, [encoding])

How can I measure a size of javascript string?

The easiest way is to use Buffer's helper functions.

Buffer.byteLength(str)

Convert javascript string length to array of byte[]

As you are using node, a Buffer is what you need. Check out the documentation here. For example:

//Make a string of length 295
const st="-foo-".repeat(59);
//Create a byte array of length 2
const buf = Buffer.alloc(2);
//Write the string length as a 16-bit big-endian number into the byte array
buf.writeUInt16BE(st.length, 0);
console.log(buf);
//<Buffer 01 27> which is [1, 39]

Be aware that this will give you the string length in characters, not the byte length of the string - the two may be the same but that is not guaranteed.

Why does a string of length 3 have 3 as its byte length?

Javascript class Buffer's default encoding is 'utf-8'. ASCII characters take 1 bytes in utf-8 encoding as you can see here. So the result should be 3. Note: Utf-8 encoding can take 1~3 bytes for one character.

How to calculate byte length containing UTF8 characters using javascript?

Counting UTF8 bytes comes up quite a bit in JavaScript, a bit of looking around and you'll find a number of libraries (here's one example: https://github.com/mathiasbynens/utf8.js) that can help. I also found a thread (https://gist.github.com/mathiasbynens/1010324) full of solutions specifically for utf8 byte counts.

Here is the smallest, and most accurate function out of that thread:

function countUtf8Bytes(s){
var b = 0, i = 0, c
for(;c=s.charCodeAt(i++);b+=c>>11?3:c>>7?2:1);
return b
}

Note: I rearranged it a bit so that the signature is easier to read. However its still a very compact function that might be hard to understand for some.

You can check its results with this tool: https://mothereff.in/byte-counter

One correction to your OP, the example string you provided i ♥ u is actually 7 bytes, this function does count it correctly.



Related Topics



Leave a reply



Submit