Java split is eating my characters
Use zero-width matching assertions:
String str = "la$le\\$li$lo";
System.out.println(java.util.Arrays.toString(
str.split("(?<!\\\\)\\$")
)); // prints "[la, le\$li, lo]"
The regex is essentially
(?<!\\)\$
It uses negative lookbehind to assert that there is not a preceding \
.
See also
- regular-expressions.info/Lookarounds
More examples of splitting on assertions
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Using a lookbehind/lookahead combo:
String str = "HelloThereHowAreYou";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[a-z])(?=[A-Z])")
)); // prints "[Hello, There, How, Are, You]"
Related questions
- Can you use zero-width matching regex in String split?
- Backreferences in lookbehind
- How do I convert CamelCase into human-readable names in Java?
How to split a string, but also keep the delimiters?
You can use lookahead and lookbehind, which are features of regular expressions.
System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));
And you will get:
[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]
The last one is what you want.
((?<=;)|(?=;))
equals to select an empty character before ;
or after ;
.
EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s
) and use Java's String.format
to replace the placeholders with the actual string you need to use; for example:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}
Regex to split a string (in Java) so that spaces are preserved?
Thanks guys, that gave me the lead I needed ... I'm using (?<=[\\s])
and it works exactly the way I want!
problem with java split()
You don't use split!
Split is to get the things BETWEEN the separator.
For this you want to eliminate the unwanted chars; '-'
The solution is simple
out=in.replaceAll("-","");
Need to split a string into two parts in java
Use lookarounds: str.split("(?<=\\d)(?=\\D)")
String[] parts = "123XYZ".split("(?<=\\d)(?=\\D)");
System.out.println(parts[0] + "-" + parts[1]);
// prints "123-XYZ"
\d
is the character class for digits; \D
is its negation. So this zero-matching assertion matches the position where the preceding character is a digit (?<=\d)
, and the following character is a non-digit (?=\D)
.
References
- regular-expressions.info/Lookarounds and Character Class
Related questions
- Java split is eating my characters.
- Is there a way to split strings with String.split() and include the delimiters?
Alternate solution using limited split
The following also works:
String[] parts = "123XYZ".split("(?=\\D)", 2);
System.out.println(parts[0] + "-" + parts[1]);
This splits just before we see a non-digit. This is much closer to your original solution, except that since it doesn't actually match the non-digit character, it doesn't "eat it up". Also, it uses limit
of 2
, which is really what you want here.
API links
String.split(String regex, int limit)
- If the limit
n
is greater than zero then the pattern will be applied at mostn - 1
times, the array's length will be no greater thann
, and the array's last entry will contain all input beyond the last matched delimiter.
- If the limit
How to split the string in java by \?
While you could escape the regular expression to String.split
with the somewhat surprising
String str = "a\\b\\c";
str.split("\\\\");
it is also possible to compile a Pattern
with Pattern.LITERAL
and then use Pattern.split(CharSequence)
like
String str = "a\\b\\c";
Pattern p = Pattern.compile("\\", Pattern.LITERAL);
String[] arr = p.split(str);
System.out.println(Arrays.toString(arr));
Which outputs
[a, b, c]
Java split String at |
Please, escape the character:
String[] parts = match.split("\\|");
Java Split String Consecutive Delimiters
String.split
leaves an empty string (""
) where it encounters consecutive delimiters, as long as you use the right regex. If you want to replace it with "empty"
, you'd have to do so yourself:
String[] split = barcodeFields.split("\\^");
for (int i = 0; i < split.length; ++i) {
if (split[i].length() == 0) {
split[i] = "empty";
}
}
Java String's split method ignores empty substrings
Use String.split(String regex, int limit)
with negative limit (e.g. -1).
"aa,bb,cc,dd,,,,".split(",", -1)
When String.split(String regex)
is called, it is called with limit
= 0, which will remove all trailing empty strings in the array (in most cases, see below).
The actual behavior of String.split(String regex)
is quite confusing:
- Splitting an empty string will result in an array of length 1. Empty string split will always result in length 1 array containing the empty string.
- Splitting
";"
or";;;"
withregex
being";"
will result in an empty array. Non-empty string split will result in all trailing empty strings in the array removed.
The behavior above can be observed from at least Java 5 to Java 8.
There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.
Related Topics
Use of Java [Interfaces/Abstract Classes]
Why Do We Have to Override the Equals() Method in Java
How to Format Localdate to String
How Does the Bitwise & (And) Work in Java
Create MySQL Database from Java
Difference Between Double and Double in Comparison
Iterating Through a List in Reverse Order in Java
Performance Concurrenthashmap VS Hashmap
Error Message "Unreported Exception Java.Io.Ioexception; Must Be Caught or Declared to Be Thrown"
Use of '? Extends ' and '? Super ' in Collection Generics
Error:Java: Invalid Source Release: 8 in Intellij. What Does It Mean
How to Launch Ie Browser Using Selenium2 (Webdriver) with Java
Is There a Good Reason to Use "Printf" Instead of "Print" in Java
How to Compile a Java Source File Which Is Encoded as "Utf-8"
This: Cannot Use This in Static Context