Split a String to Even Sized Chunks

Split a string to even sized chunks

Use textwrap.wrap:

>>> import textwrap
>>> s = 'Split a String to Even Sized ChunksSplit a String to Even Sized Chunksaaaaaaa'
>>> textwrap.wrap(s, 4)
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa']

Splitting a string into evenly-sized chunks

You are confusing the number of chunks with the chunk size.

You must calculate the chunk size with:

int chunkSize = s.Length / chunks;

If the length of the string is not divisible by chunks, this will truncate the result because integer arithmetic is performed here. E.g., if the string size is 7 and chunks = 3, then this will yield 2.
And you have a remainder of 1. If the string size was 8, the chunk size would still be 2, but the remainder would be 2. Now, you must distribute this remainder among the chunks.

You can get the remainder with the modulo operator %:

int remainder = s.Length % chunks;

Since you want the first chunks to be bigger, we now attribute this remainder to the first chunks:

int start = 0;
while (start < s.Length)
{
int thisChunkSize = chunkSize;
if (remainder > 0)
{
thisChunkSize++;
remainder--;
}
yield return s.Substring(start, thisChunkSize);
start += thisChunkSize;
}

If you need an even better distribution, you can use floating point arithmetic and round. The MidpointRounding tells what happens when rounding a value with a .5 fraction.

public static IEnumerable<string> EvenIterator(string s, int chunks)
{
int start = 0;
var rounding = new[] { MidpointRounding.ToPositiveInfinity,
MidpointRounding.ToNegativeInfinity };
int r = 0;
while (start < s.Length) {
int chunkSize = (int)Math.Round((double)(s.Length - start) / chunks, rounding[r]);
r = 1 - r; // Swap the rounding
yield return s.Substring(start, chunkSize);
start += chunkSize;
chunks--;
}
}

A test with "abcdefghijklmno" and chunk size 6 gives:

[ "abc", "de", "fgh", "ij", "klm", "no" ]

Split a string into N equal parts?

import textwrap
print(textwrap.wrap("123456789", 2))
#prints ['12', '34', '56', '78', '9']

Note: be careful with whitespace etc - this may or may not be what you want.

"""Wrap a single paragraph of text, returning a list of wrapped lines.

Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
"""

Splitting a string into chunks of a certain size

static IEnumerable<string> Split(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}

Please note that additional code might be required to gracefully handle edge cases (null or empty input string, chunkSize == 0, input string length not divisible by chunkSize, etc.). The original question doesn't specify any requirements for these edge cases and in real life the requirements might vary so they are out of scope of this answer.

Split a string into unevenly sized chunks in a repeating pattern

I did this similarly to the other answer posted about a minute ago but I didn't use a class to track state.

import itertools

def alternating_size_chunks(iterable, steps):
n = 0
step = itertools.cycle(steps)
while n < len(iterable):
next_step = next(step)
yield iterable[n:n + next_step]
n += next_step

Testing:

>>> test_string = ''.join(random.choice('01') for _ in range(50))
>>> print(list(alternating_size_chunks(test_string, (1, 8, 2))))
['1', '01111010', '01', '1', '00111011', '11', '0', '11010100', '01', '0', '10011101', '00', '0', '11111']

Note that both these methods (mine and Mark's answer) will take an arbitrary set of lengths (whether it's 1, 8, 2 or anything else), and will work even if the length of the bit stream doesn't precisely add up to a multiple of the sum of the lengths. (You can see in my example it ran out of bits and the last chunk only has five.) This may or may not be desirable in your case, so you might want to check that you have enough data to convert once you get ready to do that.

Reference: itertools.cycle

Split large string in n-size chunks in JavaScript

You can do something like this:

"1234567890".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "90"]

The method will still work with strings whose size is not an exact multiple of the chunk-size:

"123456789".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "9"]

In general, for any string out of which you want to extract at-most n-sized substrings, you would do:

str.match(/.{1,n}/g); // Replace n with the size of the substring

If your string can contain newlines or carriage returns, you would do:

str.match(/(.|[\r\n]){1,n}/g); // Replace n with the size of the substring

As far as performance, I tried this out with approximately 10k characters and it took a little over a second on Chrome. YMMV.

This can also be used in a reusable function:

function chunkString(str, length) {
return str.match(new RegExp('.{1,' + length + '}', 'g'));
}

How to split a string into chunks per number of characters and delimiter?

You could use functools.reduce to accomplish this.

import functools

def splitter(s, n):
def helper(acc, v):
tmp1 = acc[-1]
tmp2 = len(tmp1)
if tmp2 >= n or tmp2 + len(v) >= n:
acc.append(v)
else:
acc[-1] = tmp1 + ',' + v

return acc

tmp1 = s.split(',')
if len(tmp1) == 1:
return tmp1

return list(functools.reduce(helper, tmp1[1:], [tmp1[0]]))

Split the string into different lengths chunks

>>> s = '25c319f75e3fbed5a9f0497750ea12992b30d565'
>>> n = [8, 4, 4, 4, 4, 12]
>>> print '-'.join([s[sum(n[:i]):sum(n[:i+1])] for i in range(len(n))])

Output

25c319f7-5e3f-bed5-a9f0-4977-50ea12992b30


Related Topics



Leave a reply



Submit