How Can Split from String to Array by Chunks of Given Size

How can split from string to array by chunks of given size

You can group your collection elements (in this case Characters) every n elements as follow:

extension Collection {
func unfoldSubSequences(limitedTo maxLength: Int) -> UnfoldSequence<SubSequence,Index> {
sequence(state: startIndex) { start in
guard start < self.endIndex else { return nil }
let end = self.index(start, offsetBy: maxLength, limitedBy: self.endIndex) ?? self.endIndex
defer { start = end }
return self[start..<end]
}
}
func subSequences(of n: Int) -> [SubSequence] {
.init(unfoldSubSequences(limitedTo: n))
}
}


let numbers = "1234567"
let subSequences = numbers.subSequences(of: 2)
print(subSequences) // ["12", "34", "56", "7"]

edit/update:

If you would like to append the exceeding characters to the last group:

extension Collection {
func unfoldSubSequencesWithTail(lenght: Int) -> UnfoldSequence<SubSequence,Index> {
let n = count / lenght
var counter = 0
return sequence(state: startIndex) { start in
guard start < endIndex else { return nil }
let end = index(start, offsetBy: lenght, limitedBy: endIndex) ?? endIndex
counter += 1
if counter == n {
defer { start = endIndex }
return self[start...]
} else {
defer { start = end }
return self[start..<end]
}
}
}
func subSequencesWithTail(n: Int) -> [SubSequence] {
.init(unfoldSubSequencesWithTail(lenght: n))
}
}


let numbers = "1234567"
let subSequencesWithTail = numbers.subSequencesWithTail(n: 2)
print(subSequencesWithTail) // ["12", "34", "567"]

Splitting a string into evenly-sized chunks

You are confusing the number of chunks with the chunk size.

You must calculate the chunk size with:

int chunkSize = s.Length / chunks;

If the length of the string is not divisible by chunks, this will truncate the result because integer arithmetic is performed here. E.g., if the string size is 7 and chunks = 3, then this will yield 2.
And you have a remainder of 1. If the string size was 8, the chunk size would still be 2, but the remainder would be 2. Now, you must distribute this remainder among the chunks.

You can get the remainder with the modulo operator %:

int remainder = s.Length % chunks;

Since you want the first chunks to be bigger, we now attribute this remainder to the first chunks:

int start = 0;
while (start < s.Length)
{
int thisChunkSize = chunkSize;
if (remainder > 0)
{
thisChunkSize++;
remainder--;
}
yield return s.Substring(start, thisChunkSize);
start += thisChunkSize;
}

If you need an even better distribution, you can use floating point arithmetic and round. The MidpointRounding tells what happens when rounding a value with a .5 fraction.

public static IEnumerable<string> EvenIterator(string s, int chunks)
{
int start = 0;
var rounding = new[] { MidpointRounding.ToPositiveInfinity,
MidpointRounding.ToNegativeInfinity };
int r = 0;
while (start < s.Length) {
int chunkSize = (int)Math.Round((double)(s.Length - start) / chunks, rounding[r]);
r = 1 - r; // Swap the rounding
yield return s.Substring(start, chunkSize);
start += chunkSize;
chunks--;
}
}

A test with "abcdefghijklmno" and chunk size 6 gives:

[ "abc", "de", "fgh", "ij", "klm", "no" ]

Splitting a string into chunks of a certain size

static IEnumerable<string> Split(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}

Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0, input string length not divisible by chunkSize, etc.). The original question doesn't specify any requirements for these edge cases and in real life the requirements might vary so they are out of scope of this answer.

How to split a string into chunks of a particular byte size?

Using Buffer seems indeed the right direction. Given that:

  • Buffer prototype has indexOf and lastIndexOf methods, and
  • 32 is the ASCII code of a space, and
  • 32 can never occur as part of a multi-byte character since all the bytes that make up a multi-byte sequence always have the most significant bit set.

... you can proceed as follows:

function chunk(s, maxBytes) {
let buf = Buffer.from(s);
const result = [];
while (buf.length) {
let i = buf.lastIndexOf(32, maxBytes+1);
// If no space found, try forward search
if (i < 0) i = buf.indexOf(32, maxBytes);
// If there's no space at all, take the whole string
if (i < 0) i = buf.length;
// This is a safe cut-off point; never half-way a multi-byte
result.push(buf.slice(0, i).toString());
buf = buf.slice(i+1); // Skip space (if any)
}
return result;
}

console.log(chunk("Hey there! € 100 to pay", 12));
// -> [ 'Hey there!', '€ 100 to', 'pay' ]

You can consider extending this to also look for TAB, LF, or CR as split-characters. If so, and your input text can have CRLF sequences, you would need to detect those as well to avoid getting orphaned CR or LF characters in the chunks.

You can turn the above function into a generator, so that you control when you want to start the processing for getting the next chunk:

function * chunk(s, maxBytes) {
let buf = Buffer.from(s);
while (buf.length) {
let i = buf.lastIndexOf(32, maxBytes+1);
// If no space found, try forward search
if (i < 0) i = buf.indexOf(32, maxBytes);
// If there's no space at all, take all
if (i < 0) i = buf.length;
// This is a safe cut-off point; never half-way a multi-byte
yield buf.slice(0, i).toString();
buf = buf.slice(i+1); // Skip space (if any)
}
}

for (let s of chunk("Hey there! € 100 to pay", 12)) console.log(s);

Browsers

Buffer is specific to Node. Browsers however implement TextEncoder and TextDecoder, which leads to similar code:

function * chunk(s, maxBytes) {    const decoder = new TextDecoder("utf-8");    let buf = new TextEncoder("utf-8").encode(s);    while (buf.length) {        let i = buf.lastIndexOf(32, maxBytes+1);        // If no space found, try forward search        if (i < 0) i = buf.indexOf(32, maxBytes);        // If there's no space at all, take all        if (i < 0) i = buf.length;        // This is a safe cut-off point; never half-way a multi-byte        yield decoder.decode(buf.slice(0, i));        buf = buf.slice(i+1); // Skip space (if any)    }}
for (let s of chunk("Hey there! € 100 to pay", 12)) console.log(s);

Split large string in n-size chunks in JavaScript

You can do something like this:

"1234567890".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "90"]

The method will still work with strings whose size is not an exact multiple of the chunk-size:

"123456789".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "9"]

In general, for any string out of which you want to extract at-most n-sized substrings, you would do:

str.match(/.{1,n}/g); // Replace n with the size of the substring

If your string can contain newlines or carriage returns, you would do:

str.match(/(.|[\r\n]){1,n}/g); // Replace n with the size of the substring

As far as performance, I tried this out with approximately 10k characters and it took a little over a second on Chrome. YMMV.

This can also be used in a reusable function:

function chunkString(str, length) {
return str.match(new RegExp('.{1,' + length + '}', 'g'));
}

Split array into chunk of different size on the basis of their attribute

You can simply use Array.reduce() to group items by semester. Object.values() on the map gives you the desired result.

var array=[{ semster:1, name:"Book1" }, { semster:1, name:"Book2" }, { semster:2, name:"Book4" }, { semster:3, name:"Book5" }, { semster:3, name:"Book6" }, { semster:4, name:"Book7" }];
var result = Object.values(array.reduce((a, curr)=>{ (a[curr.semster] = a[curr.semster] || []).push(curr); return a;},{}));
console.log(result);

Split array into different size chunks (4, 3, 3, 3, 4, 3, 3, 3, etc)

You could take two indices, one for the data array and one for sizes. Then slice the array with a given length and push the chunk to the chunks array.

Proceed until end of data.

var data = Array.from({ length: 26 }, (_, i) => i + 1),    sizes = [4, 3, 3, 3],    i = 0,    j = 0,    chunks = [];
while (i < data.length) chunks.push(data.slice(i, i += sizes[j++ % sizes.length]));
console.log(chunks);
.as-console-wrapper { max-height: 100% !important; top: 0; }

Split array into chunks

The array.slice() method can extract a slice from the beginning, middle, or end of an array for whatever purposes you require, without changing the original array.

const chunkSize = 10;
for (let i = 0; i < array.length; i += chunkSize) {
const chunk = array.slice(i, i + chunkSize);
// do whatever
}

The last chunk may be smaller than chunkSize. For example when given an array of 12 elements the first chunk will have 10 elements, the second chunk only has 2.

Note that a chunkSize of 0 will cause an infinite loop.



Related Topics



Leave a reply



Submit