How to split a string, but also keep the delimiters?
You can use lookahead and lookbehind, which are features of regular expressions.
System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("(?=;)")));
System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));
And you will get:
[a;, b;, c;, d]
[a, ;b, ;c, ;d]
[a, ;, b, ;, c, ;, d]
The last one is what you want.
((?<=;)|(?=;))
equals to select an empty character before ;
or after ;
.
EDIT: Fabian Steeg's comments on readability is valid. Readability is always a problem with regular expressions. One thing I do to make regular expressions more readable is to create a variable, the name of which represents what the regular expression does. You can even put placeholders (e.g. %1$s
) and use Java's String.format
to replace the placeholders with the actual string you need to use; for example:
static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))";
public void someMethod() {
final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";"));
...
}
How do I split a string in Java?
Use the appropriately named method String#split()
.
String string = "004-034556";
String[] parts = string.split("-");
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556
Note that split
's argument is assumed to be a regular expression, so remember to escape special characters if necessary.
there are 12 characters with special meanings: the backslash
\
, the caret^
, the dollar sign$
, the period or dot.
, the vertical bar or pipe symbol|
, the question mark?
, the asterisk or star*
, the plus sign+
, the opening parenthesis(
, the closing parenthesis)
, and the opening square bracket[
, the opening curly brace{
, These special characters are often called "metacharacters".
For instance, to split on a period/dot .
(which means "any character" in regex), use either backslash \
to escape the individual special character like so split("\\.")
, or use character class []
to represent literal character(s) like so split("[.]")
, or use Pattern#quote()
to escape the entire string like so split(Pattern.quote("."))
.
String[] parts = string.split(Pattern.quote(".")); // Split on the exact string.
To test beforehand if the string contains certain character(s), just use String#contains()
.
if (string.contains("-")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain -");
}
Note, this does not take a regular expression. For that, use String#matches()
instead.
If you'd like to retain the split character in the resulting parts, then make use of positive lookaround. In case you want to have the split character to end up in left hand side, use positive lookbehind by prefixing ?<=
group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?<=-)");
String part1 = parts[0]; // 004-
String part2 = parts[1]; // 034556
In case you want to have the split character to end up in right hand side, use positive lookahead by prefixing ?=
group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?=-)");
String part1 = parts[0]; // 004
String part2 = parts[1]; // -034556
If you'd like to limit the number of resulting parts, then you can supply the desired number as 2nd argument of split()
method.
String string = "004-034556-42";
String[] parts = string.split("-", 2);
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556-42
String delimiter in string.split method
There is no need to set the delimiter by breaking it up in pieces like you have done.
Here is a complete program you can compile and run:
import java.util.Arrays;
public class SplitExample {
public static final String PLAYER = "1||1||Abdul-Jabbar||Karim||1996||1974";
public static void main(String[] args) {
String[] data = PLAYER.split("\\|\\|");
System.out.println(Arrays.toString(data));
}
}
If you want to use split with a pattern, you can use Pattern.compile
or Pattern.quote
.
To see compile
and quote
in action, here is an example using all three approaches:
import java.util.Arrays;
import java.util.regex.Pattern;
public class SplitExample {
public static final String PLAYER = "1||1||Abdul-Jabbar||Karim||1996||1974";
public static void main(String[] args) {
String[] data = PLAYER.split("\\|\\|");
System.out.println(Arrays.toString(data));
Pattern pattern = Pattern.compile("\\|\\|");
data = pattern.split(PLAYER);
System.out.println(Arrays.toString(data));
pattern = Pattern.compile(Pattern.quote("||"));
data = pattern.split(PLAYER);
System.out.println(Arrays.toString(data));
}
}
The use of patterns is recommended if you are going to split often using the same pattern. BTW the output is:
[1, 1, Abdul-Jabbar, Karim, 1996, 1974]
[1, 1, Abdul-Jabbar, Karim, 1996, 1974]
[1, 1, Abdul-Jabbar, Karim, 1996, 1974]
Split string with | separator in java
|
is treated as an OR
in RegEx. So you need to escape it:
String[] separated = line.split("\\|");
Java string split with . (dot)
You need to escape the dot if you want to split on a literal dot:
String extensionRemoved = filename.split("\\.")[0];
Otherwise you are splitting on the regex .
, which means "any character".
Note the double backslash needed to create a single backslash in the regex.
You're getting an ArrayIndexOutOfBoundsException
because your input string is just a dot, ie "."
, which is an edge case that produces an empty array when split on dot; split(regex)
removes all trailing blanks from the result, but since splitting a dot on a dot leaves only two blanks, after trailing blanks are removed you're left with an empty array.
To avoid getting an ArrayIndexOutOfBoundsException
for this edge case, use the overloaded version of split(regex, limit)
, which has a second parameter that is the size limit for the resulting array. When limit
is negative, the behaviour of removing trailing blanks from the resulting array is disabled:
".".split("\\.", -1) // returns an array of two blanks, ie ["", ""]
ie, when filename
is just a dot "."
, calling filename.split("\\.", -1)[0]
will return a blank, but calling filename.split("\\.")[0]
will throw an ArrayIndexOutOfBoundsException
.
Use String.split() with multiple delimiters
I think you need to include the regex OR operator:
String[]tokens = pdfName.split("-|\\.");
What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] -
or .
Why can't I use . as a delimiter in split() function?
The split
function uses regular expressions, you have to escape your "." with a "\"
When using regular expressions a "." means any character. Try this
String delimiter = "\\.x";
It should also be mentioned that \
in java is also a special character used to create other special characters. Therefore you have to escape your \
with another \
hence the "\\.x"
Theres some great documentation in the Java docs about all the special characters and what they do:
Java 8 Docs
Java 7 Docs
Java 6 Docs
Java String Split with multiple delimiter using pipe '|'
You can use the given link for understanding of How Delimiters Works.
How do I use a delimiter in Java Scanner?
Another alternative Way
You can use useDelimiter(String pattern) method of Scanner class. The use of useDelimiter(String pattern) method of Scanner class. Basically we have used the String semicolon(;) to tokenize the String declared on the constructor of Scanner object.
There are three possible token on the String “Anne Mills/Female/18″ which is name,gender and age. The scanner class is used to split the String and output the tokens in the console.
import java.util.Scanner;
/*
* This is a java example source code that shows how to use useDelimiter(String pattern)
* method of Scanner class. We use the string ; as delimiter
* to use in tokenizing a String input declared in Scanner constructor
*/
public class ScannerUseDelimiterDemo {
public static void main(String[] args) {
// Initialize Scanner object
Scanner scan = new Scanner("Anna Mills/Female/18");
// initialize the string delimiter
scan.useDelimiter("/");
// Printing the delimiter used
System.out.println("The delimiter use is "+scan.delimiter());
// Printing the tokenized Strings
while(scan.hasNext()){
System.out.println(scan.next());
}
// closing the scanner stream
scan.close();
}
}
Java splitting string using delimiter and store to different array
Here is one approach which splits in the input on the following regex pattern:
_(?!.*_)
This splits the input string on only the last underscore character. We can try iterating your collection of inputs, and then populating the two arrays.
List<String> inputs = Arrays.asList(new String[] {"1564095_SINGLE_true", "1564096_SINGLE_true"});
String[] arrayA = new String[2];
String[] arrayB = new String[2];
int index = 0;
for (String input : inputs) {
arrayA[index] = input.split("_(?!.*_)")[0];
arrayB[index] = input.split("_(?!.*_)")[1];
++index;
}
System.out.println("A[]: " + Arrays.toString(arrayA));
System.out.println("B[]: " + Arrays.toString(arrayB));
This prints:
A[]: [1564095_SINGLE, 1564096_SINGLE]
B[]: [true, true]
Related Topics
How to Use Java.String.Format in Scala
Create Java Console Inside a Gui Panel
How to Get the X and Y of a Program Window in Java
Differencebetween a Javabean and a Pojo
Does a Finally Block Always Run
String to String Array Conversion in Java
Deep Clone Utility Recommendation
How to Handle Pop-Up in Selenium Webdriver Using Java
Java MVC - How to Divide a Done Text Game into MVC
Comparing Two Java.Util.Dates to See If They Are in the Same Day
Purpose of a Constructor in Java
Correctly Implementing the MVC Pattern in Gui Development Using Swing in Java
Spring Boot Rest Service Exception Handling
Easy, Simple to Use Lru Cache in Java