How to Split a Long String Without Breaking Words

How to split a long string without breaking words?

This code avoid breaking words, you won't get it using wordwrap().

The maximum length is defined using $maxLineLength. I've done some tests and it works fine.

$longString = 'I like apple. You like oranges. We like fruit. I like meat, also.';

$words = explode(' ', $longString);

$maxLineLength = 18;

$currentLength = 0;
$index = 0;

foreach ($words as $word) {
// +1 because the word will receive back the space in the end that it loses in explode()
$wordLength = strlen($word) + 1;

if (($currentLength + $wordLength) <= $maxLineLength) {
$output[$index] .= $word . ' ';
$currentLength += $wordLength;
} else {
$index += 1;
$currentLength = $wordLength;
$output[$index] = $word;
}
}

Splitting string without breaking words or ignoring any characters

Start by breaking the input string into "words". This could be easy or hard depending on how you define a "word". For words simply separated by any amount of white space, something like this will do the job nicely:

 String[] words = nfAddr.split("\\s+");

Now that you've got the individual words, reassemble them into lines of the desired maximum length, adding spaces between them, and then string the resulting lines together with line breaks between them. Here's an example of a simple routine to do this:

static public String formatParagraph(String text, int maxWidth)
{
String[] words = text.split("\\s+");

StringBuilder pp = new StringBuilder();
StringBuilder line = new StringBuilder();
for (String w : words) {
if (line.length() + w.length() + 1 > maxWidth) {
if (pp.length() > 0) {
pp.append(System.lineSeparator());
}
pp.append(line.toString());
line.setLength(0);
}
if (line.length() > 0) {
line.append(' ');
}
line.append(w);
}
if (line.length() > 0) {
if (pp.length() > 0)
pp.append(System.lineSeparator());
pp.append(line);
}
return pp.toString();
}

Split string every n characters but without splitting a word

You can use built-in textwrap.wrap function (doc):

orig_string = 'I am a string in python'

from textwrap import wrap

print(wrap(orig_string, 10))

Prints:

['I am a', 'string in', 'python']

How to break string by character and line length, without breaking words?

You can't split the string using the | as you would lose the information about where they existing in the original string. Also you won't be able to do this with foreach as you need to look ahead when calculating the length of the next string. Taking your original code you can do this:

int partLength = 35;
string sentence = "Item 1 | Item 2 | Item 3 | Item 4 | Item 5 | Etc";
string[] words = sentence.Split(' ');
var parts = new Dictionary<int, string>();
string part = string.Empty;
int partCounter = 0;

for(int i = 0; i < words.Count(); i++)
{
var newLength = part.Length + words[i].Length;

if(words[i] == "|" && i + 1 < words.Count())
{
newLength += words[i + 1].Length;
}

if (newLength < partLength)
{
part += string.IsNullOrEmpty(part) ? words[i] : " " + words[i];
}
else
{
parts.Add(partCounter, part);
part = words[i];
partCounter++;
}
}
parts.Add(partCounter, part);
foreach (var item in parts)
{
Console.WriteLine(item.Value);
}

We still split on a space but we use a for loop to iterate through the strings. Before we check if the current word fits we need to check if it is a |. If it is then add the next word as well (if one exists). This should produce the output you are looking for.

Splitting long string without breaking words fulfilling lines

Non-optimal offline fast 1D bin packing Python algorithm

def binPackingFast(words, limit, sep=" "):
if max(map(len, words)) > limit:
raise ValueError("limit is too small")
words.sort(key=len, reverse=True)
res, part, others = [], words[0], words[1:]
for word in others:
if len(sep)+len(word) > limit-len(part):
res.append(part)
part = word
else:
part += sep+word
if part:
res.append(part)
return res

Performance

Tested over /usr/share/dict/words (provided by words-3.0-20.fc18.noarch) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:

limit = max(map(len, words))
sep = ""

With limit *= 1.5 I get 92%, with limit *= 2 I get 96% (same execution time).

Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)

no efficient bin-packing algorithm can be guaranteed to do better

Source: http://mathworld.wolfram.com/Bin-PackingProblem.html

Moral of the story

While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.

Resources

  • http://mathworld.wolfram.com/Bin-PackingProblem.html
  • https://github.com/hudora/pyShipping/

Notes

  • I didn't use textwrap for my implementation because it's slower than my simple Python code.
    Maybe it's related with: Why are textwrap.wrap() and textwrap.fill() so slow?
  • It seems to work perfectly even if the sorting is not reversed.

Best way to split string into lines with maximum length, without breaking words

How about this as a solution:

IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}

I reworked this as prefer this:

IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}

chunk/split a string in Javascript without breaking words

Something like this?

var n = 80;

while (n) {
if (input[n++] == ' ') {
break;
}
}

output = input.substring(0,n).split(' ');
console.log(output);

UPDATED

Now that I re-read the question, here's an updated solution:

var len = 80;
var curr = len;
var prev = 0;

output = [];

while (input[curr]) {
if (input[curr++] == ' ') {
output.push(input.substring(prev,curr));
prev = curr;
curr += len;
}
}
output.push(input.substr(prev));

How to split a string to List string without splitting words?

You can give it a max value like 64, and then use that as a index to search backwards and find the first space and split it there. Repeat on the remaining string using recursion and you're done.

public static IEnumerable<string> SmartSplit(this string input, int maxLength)
{
int i = 0;
while(i + maxLength < input.Length)
{
int index = input.LastIndexOf(' ', i + maxLength);
if(index<=0) //if word length > maxLength.
{
index=maxLength;
}
yield return input.Substring(i, index - i);

i = index + 1;
}

yield return input.Substring(i);
}


Related Topics



Leave a reply



Submit