Split Large String in N-Size Chunks in JavaScript

Split large string in n-size chunks in JavaScript

You can do something like this:

"1234567890".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "90"]

The method will still work with strings whose size is not an exact multiple of the chunk-size:

"123456789".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "9"]

In general, for any string out of which you want to extract at-most n-sized substrings, you would do:

str.match(/.{1,n}/g); // Replace n with the size of the substring

If your string can contain newlines or carriage returns, you would do:

str.match(/(.|[\r\n]){1,n}/g); // Replace n with the size of the substring

As far as performance, I tried this out with approximately 10k characters and it took a little over a second on Chrome. YMMV.

This can also be used in a reusable function:

function chunkString(str, length) {
return str.match(new RegExp('.{1,' + length + '}', 'g'));
}

How to split a string into chunks of a particular byte size?

Using Buffer seems indeed the right direction. Given that:

  • Buffer prototype has indexOf and lastIndexOf methods, and
  • 32 is the ASCII code of a space, and
  • 32 can never occur as part of a multi-byte character since all the bytes that make up a multi-byte sequence always have the most significant bit set.

... you can proceed as follows:

function chunk(s, maxBytes) {
let buf = Buffer.from(s);
const result = [];
while (buf.length) {
let i = buf.lastIndexOf(32, maxBytes+1);
// If no space found, try forward search
if (i < 0) i = buf.indexOf(32, maxBytes);
// If there's no space at all, take the whole string
if (i < 0) i = buf.length;
// This is a safe cut-off point; never half-way a multi-byte
result.push(buf.slice(0, i).toString());
buf = buf.slice(i+1); // Skip space (if any)
}
return result;
}

console.log(chunk("Hey there! € 100 to pay", 12));
// -> [ 'Hey there!', '€ 100 to', 'pay' ]

You can consider extending this to also look for TAB, LF, or CR as split-characters. If so, and your input text can have CRLF sequences, you would need to detect those as well to avoid getting orphaned CR or LF characters in the chunks.

You can turn the above function into a generator, so that you control when you want to start the processing for getting the next chunk:

function * chunk(s, maxBytes) {
let buf = Buffer.from(s);
while (buf.length) {
let i = buf.lastIndexOf(32, maxBytes+1);
// If no space found, try forward search
if (i < 0) i = buf.indexOf(32, maxBytes);
// If there's no space at all, take all
if (i < 0) i = buf.length;
// This is a safe cut-off point; never half-way a multi-byte
yield buf.slice(0, i).toString();
buf = buf.slice(i+1); // Skip space (if any)
}
}

for (let s of chunk("Hey there! € 100 to pay", 12)) console.log(s);

Browsers

Buffer is specific to Node. Browsers however implement TextEncoder and TextDecoder, which leads to similar code:

function * chunk(s, maxBytes) {    const decoder = new TextDecoder("utf-8");    let buf = new TextEncoder("utf-8").encode(s);    while (buf.length) {        let i = buf.lastIndexOf(32, maxBytes+1);        // If no space found, try forward search        if (i < 0) i = buf.indexOf(32, maxBytes);        // If there's no space at all, take all        if (i < 0) i = buf.length;        // This is a safe cut-off point; never half-way a multi-byte        yield decoder.decode(buf.slice(0, i));        buf = buf.slice(i+1); // Skip space (if any)    }}
for (let s of chunk("Hey there! € 100 to pay", 12)) console.log(s);

How can I split a string into segments of n characters?

var str = 'abcdefghijkl';console.log(str.match(/.{1,3}/g));

Split a given string into equal parts where number of sub strings will be of equal size and dynamic in nature?

You could give the length of the substrings and iterate until the end of the adjusted string.

function split(string, size) {    var splitted = [],        i = 0;            string = string.match(/\S+/g).join('');    while (i < string.length) splitted.push(string.slice(i, i += size));    return splitted;}
console.log(...split('Hello World', 2));console.log(...split('Hello Worlds', 2));

Split a JavaScript string into fixed-length pieces

You can try this:

var a = 'aaaabbbbccccee';
var b = a.match(/(.{1,4})/g);


Related Topics



Leave a reply



Submit