Split String to Equal Length Substrings in Java

Split string to equal length substrings in Java

Here's the regex one-liner version:

System.out.println(Arrays.toString(
"Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A. The enclosing lookbehind matches the position that's four characters along from the end of the last match.

Both lookbehind and \G are advanced regex features, not supported by all flavors. Furthermore, \G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \G, and couldn't be used this way even if JS did support lookbehind.

I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

Also, this doesn't work in Android, which doesn't support the use of \G in lookbehinds.

split a string in java into equal length substrings while maintaining word boundaries

If I understand your problem correctly then this code should do what you need (but it assumes that maxLenght is equal or greater than longest word)

String data = "Hello there, my name is not importnant right now."
+ " I am just simple sentecne used to test few things.";
int maxLenght = 10;
Pattern p = Pattern.compile("\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)", Pattern.DOTALL);
Matcher m = p.matcher(data);
while (m.find())
System.out.println(m.group(1));

Output:

Hello
there, my
name is
not
importnant
right now.
I am just
simple
sentecne
used to
test few
things.

Short (or not) explanation of "\\G\\s*(.{1,"+maxLenght+"})(?=\\s|$)" regex:

(lets just remember that in Java \ is not only special in regex, but also in String literals, so to use predefined character sets like \d we need to write it as "\\d" because we needed to escape that \ also in string literal)

  • \G - is anchor representing end of previously founded match, or if there is no match yet (when we just started searching) beginning of string (same as ^ does)
  • \s* - represents zero or more whitespaces (\s represents whitespace, * "zero-or-more" quantifier)
  • (.{1,"+maxLenght+"}) - lets split it in more parts (at runtime :maxLenght will hold some numeric value like 10 so regex will see it as .{1,10})

    • . represents any character (actually by default it may represent any character except line separators like \n or \r, but thanks to Pattern.DOTALL flag it can now represent any character - you may get rid of this method argument if you want to start splitting each sentence separately since its start will be printed in new line anyway)
    • {1,10} - this is quantifier which lets previously described element appear 1 to 10 times (by default will try to find maximal amout of matching repetitions),
    • .{1,10} - so based on what we said just now, it simply represents "1 to 10 of any characters"
    • ( ) - parenthesis create groups, structures which allow us to hold specific parts of match (here we added parenthesis after \\s* because we will want to use only part after whitespaces)
  • (?=\\s|$) - is look-ahead mechanism which will make sure that text matched by .{1,10} will have after it:

    • space (\\s)

      OR (written as |)

    • end of the string $ after it.

So thanks to .{1,10} we can match up to 10 characters. But with (?=\\s|$) after it we require that last character matched by .{1,10} is not part of unfinished word (there must be space or end of string after it).

How to use split a string after a certain length?

I wouldn't use String.split for this at all:

String message = "Who Framed Roger Rabbit";
for (int i = 0; i < message.length(); i += 10) {
System.out.println(message.substring(i, Math.min(i + 10, message.length()));
}

Addition 2018/5/8:

If you are simply printing the parts of the string, there is a more efficient option, in that it avoids creating the substrings explicitly:

PrintWriter w = new PrintWriter(System.out);
for (int i = 0; i < message.length(); i += 10) {
w.write(message, i, Math.min(i + 10, message.length());
w.write(System.lineSeparator());
}
w.flush();

splitting string in java into fixed length chunks

Since your Strings are not in an array or List you need to assign them explicitely.

    Matcher m = Pattern.compile(".{1,30}").matcher(s);
String s1 = m.find() ? s.substring(m.start(), m.end()) : "";
String s2 = m.find() ? s.substring(m.start(), m.end()) : "";
String s3 = m.find() ? s.substring(m.start(), m.end()) : "";
String s4 = m.find() ? s.substring(m.start(), m.end()) : "";
String s5 = m.find() ? s.substring(m.start(), m.end()) : "";
String s6 = m.find() ? s.substring(m.start(), m.end()) : "";
String s7 = m.find() ? s.substring(m.start(), m.end()) : "";

Java: How to split a string by a number of characters?

I think that what he wants is to have a string split into substrings of size 4. Then I would do this in a loop:

List<String> strings = new ArrayList<String>();
int index = 0;
while (index < text.length()) {
strings.add(text.substring(index, Math.min(index + 4,text.length())));
index += 4;
}

Splitting a string into n-length chunks in Java

You can do this with Guava's Splitter:

 Splitter.fixedLength(chunkSize).split(s)

...which returns an Iterable<String>.

Some more examples in this answer.

Split a given string into equal parts where number of sub strings will be of equal size and dynamic in nature?

You could give the length of the substrings and iterate until the end of the adjusted string.

function split(string, size) {    var splitted = [],        i = 0;            string = string.match(/\S+/g).join('');    while (i < string.length) splitted.push(string.slice(i, i += size));    return splitted;}
console.log(...split('Hello World', 2));console.log(...split('Hello Worlds', 2));

I want to split a string after a specific length without cut any words, not equal String

You may not be able to do it exactly. But use String.indexOf() to find the first space starting at 35. Then use the substring method to divide the string.

      String text = "Rupees Two Hundred Forty One and Sixty Eight only";
int i = text.indexOf(" ", 35);
if (i < 0) {
i = text.length();
}
String part1 = text.substring(0,i).trim();
String part2 = text.substring(i).trim();

Here is an alternative method. It has not been fully checked for border cases.

      String[] words = text.split(" ");
int k;
part1 = words[0];
for (k = 1; k < words.length; k++) {
if (part1.length() >= 35 - words[k].length()) {
break;
}
part1 += " " + words[k];
}
if (k < words.length) {
part2 = words[k++];
while (k < words.length) {
part2 += " " + words[k++];
}
}
System.out.println(part1);
System.out.println(part2);



Related Topics



Leave a reply



Submit