Split a string to even sized chunks
Use textwrap.wrap
:
>>> import textwrap
>>> s = 'Split a String to Even Sized ChunksSplit a String to Even Sized Chunksaaaaaaa'
>>> textwrap.wrap(s, 4)
['aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaaa', 'aaa']
Splitting a string into evenly-sized chunks
You are confusing the number of chunks with the chunk size.
You must calculate the chunk size with:
int chunkSize = s.Length / chunks;
If the length of the string is not divisible by chunks
, this will truncate the result because integer arithmetic is performed here. E.g., if the string size is 7 and chunks = 3
, then this will yield 2
.
And you have a remainder of 1
. If the string size was 8
, the chunk size would still be 2
, but the remainder would be 2
. Now, you must distribute this remainder among the chunks.
You can get the remainder with the modulo operator %
:
int remainder = s.Length % chunks;
Since you want the first chunks to be bigger, we now attribute this remainder to the first chunks:
int start = 0;
while (start < s.Length)
{
int thisChunkSize = chunkSize;
if (remainder > 0)
{
thisChunkSize++;
remainder--;
}
yield return s.Substring(start, thisChunkSize);
start += thisChunkSize;
}
If you need an even better distribution, you can use floating point arithmetic and round. The MidpointRounding
tells what happens when rounding a value with a .5
fraction.
public static IEnumerable<string> EvenIterator(string s, int chunks)
{
int start = 0;
var rounding = new[] { MidpointRounding.ToPositiveInfinity,
MidpointRounding.ToNegativeInfinity };
int r = 0;
while (start < s.Length) {
int chunkSize = (int)Math.Round((double)(s.Length - start) / chunks, rounding[r]);
r = 1 - r; // Swap the rounding
yield return s.Substring(start, chunkSize);
start += chunkSize;
chunks--;
}
}
A test with "abcdefghijklmno"
and chunk size 6
gives:
[ "abc", "de", "fgh", "ij", "klm", "no" ]
Split a string into N equal parts?
import textwrap
print(textwrap.wrap("123456789", 2))
#prints ['12', '34', '56', '78', '9']
Note: be careful with whitespace etc - this may or may not be what you want.
"""Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
"""
Splitting a string into chunks of a certain size
static IEnumerable<string> Split(string str, int chunkSize)
{
return Enumerable.Range(0, str.Length / chunkSize)
.Select(i => str.Substring(i * chunkSize, chunkSize));
}
Please note that additional code might be required to gracefully handle edge cases (null
or empty input string, chunkSize == 0
, input string length not divisible by chunkSize
, etc.). The original question doesn't specify any requirements for these edge cases and in real life the requirements might vary so they are out of scope of this answer.
Split a string into unevenly sized chunks in a repeating pattern
I did this similarly to the other answer posted about a minute ago but I didn't use a class to track state.
import itertools
def alternating_size_chunks(iterable, steps):
n = 0
step = itertools.cycle(steps)
while n < len(iterable):
next_step = next(step)
yield iterable[n:n + next_step]
n += next_step
Testing:
>>> test_string = ''.join(random.choice('01') for _ in range(50))
>>> print(list(alternating_size_chunks(test_string, (1, 8, 2))))
['1', '01111010', '01', '1', '00111011', '11', '0', '11010100', '01', '0', '10011101', '00', '0', '11111']
Note that both these methods (mine and Mark's answer) will take an arbitrary set of lengths (whether it's 1, 8, 2 or anything else), and will work even if the length of the bit stream doesn't precisely add up to a multiple of the sum of the lengths. (You can see in my example it ran out of bits and the last chunk only has five.) This may or may not be desirable in your case, so you might want to check that you have enough data to convert once you get ready to do that.
Reference: itertools.cycle
Split large string in n-size chunks in JavaScript
You can do something like this:
"1234567890".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "90"]
The method will still work with strings whose size is not an exact multiple of the chunk-size:
"123456789".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "9"]
In general, for any string out of which you want to extract at-most n-sized substrings, you would do:
str.match(/.{1,n}/g); // Replace n with the size of the substring
If your string can contain newlines or carriage returns, you would do:
str.match(/(.|[\r\n]){1,n}/g); // Replace n with the size of the substring
As far as performance, I tried this out with approximately 10k characters and it took a little over a second on Chrome. YMMV.
This can also be used in a reusable function:
function chunkString(str, length) {
return str.match(new RegExp('.{1,' + length + '}', 'g'));
}
How to split a string into chunks per number of characters and delimiter?
You could use functools.reduce
to accomplish this.
import functools
def splitter(s, n):
def helper(acc, v):
tmp1 = acc[-1]
tmp2 = len(tmp1)
if tmp2 >= n or tmp2 + len(v) >= n:
acc.append(v)
else:
acc[-1] = tmp1 + ',' + v
return acc
tmp1 = s.split(',')
if len(tmp1) == 1:
return tmp1
return list(functools.reduce(helper, tmp1[1:], [tmp1[0]]))
Split the string into different lengths chunks
>>> s = '25c319f75e3fbed5a9f0497750ea12992b30d565'
>>> n = [8, 4, 4, 4, 4, 12]
>>> print '-'.join([s[sum(n[:i]):sum(n[:i+1])] for i in range(len(n))])
Output
25c319f7-5e3f-bed5-a9f0-4977-50ea12992b30
Related Topics
Find First Sequence Item That Matches a Criterion
Concatenate Two Numpy Arrays Vertically
What Version of Visual Studio Is Python on My Computer Compiled With
Simple Python Challenge: Fastest Bitwise Xor on Data Buffers
How to Open a File for Exclusive Access in Python
Tkinter: Using Scrollbars on a Canvas
Python - Rolling Functions for Groupby Object
Python: Call a Function from String Name
What's a Faster Operation, Re.Match/Search or Str.Find
Python - Using Pandas Structures with Large CSV(Iterate and Chunksize)
Use and Meaning of "In" in an If Statement
Chain-Calling Parent Initialisers in Python
Run Child Processes as Different User from a Long Running Python Process
What Is the Most Pythonic Way to Check If an Object Is a Number
How to Get a Raw, Compiled SQL Query from a SQLalchemy Expression