How to split a long string without breaking words?
This code avoid breaking words, you won't get it using wordwrap().
The maximum length is defined using $maxLineLength
. I've done some tests and it works fine.
$longString = 'I like apple. You like oranges. We like fruit. I like meat, also.';
$words = explode(' ', $longString);
$maxLineLength = 18;
$currentLength = 0;
$index = 0;
foreach ($words as $word) {
// +1 because the word will receive back the space in the end that it loses in explode()
$wordLength = strlen($word) + 1;
if (($currentLength + $wordLength) <= $maxLineLength) {
$output[$index] .= $word . ' ';
$currentLength += $wordLength;
} else {
$index += 1;
$currentLength = $wordLength;
$output[$index] = $word;
}
}
Splitting string without breaking words or ignoring any characters
Start by breaking the input string into "words". This could be easy or hard depending on how you define a "word". For words simply separated by any amount of white space, something like this will do the job nicely:
String[] words = nfAddr.split("\\s+");
Now that you've got the individual words, reassemble them into lines of the desired maximum length, adding spaces between them, and then string the resulting lines together with line breaks between them. Here's an example of a simple routine to do this:
static public String formatParagraph(String text, int maxWidth)
{
String[] words = text.split("\\s+");
StringBuilder pp = new StringBuilder();
StringBuilder line = new StringBuilder();
for (String w : words) {
if (line.length() + w.length() + 1 > maxWidth) {
if (pp.length() > 0) {
pp.append(System.lineSeparator());
}
pp.append(line.toString());
line.setLength(0);
}
if (line.length() > 0) {
line.append(' ');
}
line.append(w);
}
if (line.length() > 0) {
if (pp.length() > 0)
pp.append(System.lineSeparator());
pp.append(line);
}
return pp.toString();
}
Split string every n characters but without splitting a word
You can use built-in textwrap.wrap
function (doc):
orig_string = 'I am a string in python'
from textwrap import wrap
print(wrap(orig_string, 10))
Prints:
['I am a', 'string in', 'python']
How to break string by character and line length, without breaking words?
You can't split the string using the |
as you would lose the information about where they existing in the original string. Also you won't be able to do this with foreach
as you need to look ahead when calculating the length of the next string. Taking your original code you can do this:
int partLength = 35;
string sentence = "Item 1 | Item 2 | Item 3 | Item 4 | Item 5 | Etc";
string[] words = sentence.Split(' ');
var parts = new Dictionary<int, string>();
string part = string.Empty;
int partCounter = 0;
for(int i = 0; i < words.Count(); i++)
{
var newLength = part.Length + words[i].Length;
if(words[i] == "|" && i + 1 < words.Count())
{
newLength += words[i + 1].Length;
}
if (newLength < partLength)
{
part += string.IsNullOrEmpty(part) ? words[i] : " " + words[i];
}
else
{
parts.Add(partCounter, part);
part = words[i];
partCounter++;
}
}
parts.Add(partCounter, part);
foreach (var item in parts)
{
Console.WriteLine(item.Value);
}
We still split on a space but we use a for
loop to iterate through the strings. Before we check if the current word fits we need to check if it is a |
. If it is then add the next word as well (if one exists). This should produce the output you are looking for.
Splitting long string without breaking words fulfilling lines
Non-optimal offline fast 1D bin packing Python algorithm
def binPackingFast(words, limit, sep=" "):
if max(map(len, words)) > limit:
raise ValueError("limit is too small")
words.sort(key=len, reverse=True)
res, part, others = [], words[0], words[1:]
for word in others:
if len(sep)+len(word) > limit-len(part):
res.append(part)
part = word
else:
part += sep+word
if part:
res.append(part)
return res
Performance
Tested over /usr/share/dict/words
(provided by words-3.0-20.fc18.noarch
) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:
limit = max(map(len, words))
sep = ""
With limit *= 1.5
I get 92%, with limit *= 2
I get 96% (same execution time).
Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)
no efficient bin-packing algorithm can be guaranteed to do better
Source: http://mathworld.wolfram.com/Bin-PackingProblem.html
Moral of the story
While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.
Resources
- http://mathworld.wolfram.com/Bin-PackingProblem.html
- https://github.com/hudora/pyShipping/
Notes
- I didn't use textwrap for my implementation because it's slower than my simple Python code.
Maybe it's related with: Why are textwrap.wrap() and textwrap.fill() so slow? - It seems to work perfectly even if the sorting is not reversed.
Best way to split string into lines with maximum length, without breaking words
How about this as a solution:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ').Concat(new [] { "" });
return
words
.Skip(1)
.Aggregate(
words.Take(1).ToList(),
(a, w) =>
{
var last = a.Last();
while (last.Length > maximumLineLength)
{
a[a.Count() - 1] = last.Substring(0, maximumLineLength);
last = last.Substring(maximumLineLength);
a.Add(last);
}
var test = last + " " + w;
if (test.Length > maximumLineLength)
{
a.Add(w);
}
else
{
a[a.Count() - 1] = test;
}
return a;
});
}
I reworked this as prefer this:
IEnumerable<string> SplitToLines(string stringToSplit, int maximumLineLength)
{
var words = stringToSplit.Split(' ');
var line = words.First();
foreach (var word in words.Skip(1))
{
var test = $"{line} {word}";
if (test.Length > maximumLineLength)
{
yield return line;
line = word;
}
else
{
line = test;
}
}
yield return line;
}
chunk/split a string in Javascript without breaking words
Something like this?
var n = 80;
while (n) {
if (input[n++] == ' ') {
break;
}
}
output = input.substring(0,n).split(' ');
console.log(output);
UPDATED
Now that I re-read the question, here's an updated solution:
var len = 80;
var curr = len;
var prev = 0;
output = [];
while (input[curr]) {
if (input[curr++] == ' ') {
output.push(input.substring(prev,curr));
prev = curr;
curr += len;
}
}
output.push(input.substr(prev));
How to split a string to List string without splitting words?
You can give it a max value like 64, and then use that as a index to search backwards and find the first space and split it there. Repeat on the remaining string using recursion and you're done.
public static IEnumerable<string> SmartSplit(this string input, int maxLength)
{
int i = 0;
while(i + maxLength < input.Length)
{
int index = input.LastIndexOf(' ', i + maxLength);
if(index<=0) //if word length > maxLength.
{
index=maxLength;
}
yield return input.Substring(i, index - i);
i = index + 1;
}
yield return input.Substring(i);
}
Related Topics
Using PHP Substr() and Strip_Tags() While Retaining Formatting and Without Breaking HTML
Check for Consecutive Dates Within a Set and Return as Range
How to Call a Model from a View
How to Extract Query Parameters from a Url String in PHP
Mysqli Error: User Already Has More Than 'Max_User_Connections' Active Connections
How to Read a .Tar.Gz File with PHP
Replacing MySQL_* Functions with Pdo and Prepared Statements
Warning: File_Get_Contents(): Https:// Wrapper Is Disabled in the Server Configuration by All
How to Get the Os on Which PHP Is Running
Date Function Output in a Local Language
Utf-8 Encoded HTML Pages Show Questions Marks Instead of Characters
How to Scrape Website Content in PHP from a Website That Requires a Cookie Login
What Is the Most Efficient Way to Count All the Occurrences of a Specific Character in a PHP String