Java Split on Spaces and Special Characters
Just use:
String[] terms = input.split("[\\s@&.?$+-]+");
You can put a short-hand character class inside a character class (note the \s
), and most meta-character loses their meaning inside a character class, except for [
, ]
, -
, &
, \
. However, &
is meaningful only when comes in pair &&
, and -
is treated as literal character if put at the beginning or the end of the character class.
Other languages may have different rules for parsing the pattern, but the rule about -
applies for most of the engines.
As @Sean Patrick Floyd mentioned in his answer, the important thing boils down to defining what constitute a word. \w
in Java is equivalent to [a-zA-Z0-9_]
(English letters upper and lower case, digits and underscore), and therefore, \W
consists of all other characters. If you want to consider Unicode letters and digits, you may want to look at Unicode character classes.
Java regex - split string with leading special characters
Split is behaving as expected by splitting off a zero-length string at the start before the first comma.
To fix, first remove all splitting chars from the start:
String[] sArr = s.replaceAll("^([^a-zA-Z]*\\s*)*", "").split("[^a-zA-Z]+\\s*");
Note that I’ve altered the removal regex to trim any sequence of spaces and non-letters from the front.
You don’t need to remove from the tail because split discards empty trailing elements from the result.
Split Java string on spaces with special characters and complications
You may use this regex for matching with a lookahead assertion:
-?[a-z_]\w*(?:=".*?"(?=\h+(?:-[a-z](?=\h|$)|[a-z]\w*=)|$)|\S+)?
RegEx Demo
RegEx Explanation:
-?
: Start with an optional hyphen[a-z_]\w*
: match a variable that starts with a lowercase letter or underscore followed by 0+ word characters(?:
: Start non-capture group".*?"(?=...<expression>)
: Match quoted string that starts and ends with double quote. Using lookahead we assert that we have another variable or end of line ahead.|
: OR\S+
: Match 1+ non-whitespace characters
)
: End non-capture group
How to split a string on whitespace and on special char while getting there offset values in java
Matcher#start
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\b\\S+\\b|\\p{Punct}");
Matcher matcher = pattern.matcher("I live, in India.");
while (matcher.find()) {
System.out.println(matcher.group() + " => " + matcher.start());
}
}
}
Output:
I => 0
live => 2
, => 6
in => 8
India => 11
. => 16
Explanation of regex:
\b
specifies word boundary.|
specifiesOR
.\p{Punct}
specifies punctuation.\S+
specifies one or more non-whitespace character.
Splitting a string using special characters and keeping them
So you want to use split()
to get every character separately, except for spaces and commas, so split by spaces/commas and by "nothing", i.e. the zero-width "space" between non-space/comma characters.
String str = "g, i+, w+ | (d | (u+, f))+";
String[] chunks = str.split("[\\s,]+|(?<![\\s,])(?![\\s,])");
System.out.println(String.join(",", chunks));
Output
g,i,+,w,+,|,(,d,|,(,u,+,f,),),+
Alternative: Search for what you want, and collect it into an array or List
(requires Java 9):
String str = "g, i+, w+ | (d | (u+, f))+";
String[] chunks = Pattern.compile("[^\\s,]").matcher(str).results()
.map(MatchResult::group).toArray(String[]::new);
System.out.println(String.join(",", chunks));
Same output.
For older versions of Java, use a find()
loop:
String str = "g, i+, w+ | (d | (u+, f))+";
List<String> chunkList = new ArrayList<>();
for (Matcher m = Pattern.compile("[^\\s,]").matcher(str); m.find(); )
chunkList.add(m.group());
System.out.println(chunkList);
Output
[g, i, +, w, +, |, (, d, |, (, u, +, f, ), ), +]
You can always convert the List
to an array:
String[] chunks = chunkList.toArray(new String[0]);
Related Topics
How Is the Default Max Java Heap Size Determined
What Is Pluginmanagement in Maven'S Pom.Xml
In Java, Is the Result of the Addition of Two Chars an Int or a Char
Does Java Support Default Parameter Values
How to Create a Windows Service from Java App
"Invalid Signature File" When Attempting to Run a .Jar
The Difference Between Classes, Objects, and Instances
Convert String to Double in Java
Is Java.Sql.Timestamp Timezone Specific
What Are Java Command Line Options to Set to Allow Jvm to Be Remotely Debugged
Causes of Getting a Java.Lang.Verifyerror
How to Make a Deep Copy of an Object
How to Lock a File Using Java (If Possible)
Including Dependencies in a Jar With Maven
How to Secure an API Rest For Mobile App (If Sniffing Requests Gives You the "Key")