How to Split a String With Any Whitespace Chars as Delimiters

How to split a string with any whitespace chars as delimiters

Something in the lines of

myString.split("\\s+");

This groups all white spaces as a delimiter.

So if I have the string:

"Hello[space character][tab character]World"

This should yield the strings "Hello" and "World" and omit the empty space between the [space] and the [tab].

As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal "\s", which means, you need to pass "\\s". It can get a bit confusing.

The \\s is equivalent to [ \\t\\n\\x0B\\f\\r].

How to split a String by space

What you have should work. If, however, the spaces provided are defaulting to... something else? You can use the whitespace regex:

str = "Hello I'm your String";
String[] splited = str.split("\\s+");

This will cause any number of consecutive spaces to split your string into tokens.

How to split a string on whitespace and on special char while getting there offset values in java

Matcher#start

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("\\b\\S+\\b|\\p{Punct}");
Matcher matcher = pattern.matcher("I live, in India.");
while (matcher.find()) {
System.out.println(matcher.group() + " => " + matcher.start());
}
}
}

Output:

I => 0
live => 2
, => 6
in => 8
India => 11
. => 16

Explanation of regex:

  1. \b specifies word boundary.
  2. | specifies OR.
  3. \p{Punct} specifies punctuation.
  4. \S+ specifies one or more non-whitespace character.

split string with whitespace AND multiple other operation signs?

You need to understand the regular expression first. For basic use it is very simple :) For your requirement you can split it with String[] split = mystr.split("[-+*/#_^]"); The square brackets provides a list of characters to match, if any one of the character in that square bracket match then it is a match.

c#: how to split string using default whitespaces + set of addtional delimiters?

Just use the appropriate overload of string.Split if you're at least on .NET 2.0:

char[] separator = new[] { ' ', '.', ',', ';' };
string[] parts = text.Split(separator, StringSplitOptions.RemoveEmptyEntries);

I guess i was downvoted because of the incomplete answer. OP has asked for a way to split by all white-spaces(which are 25 on my pc) but also by other delimiters:

public static class StringExtensions
{
static StringExtensions()
{
var whiteSpaceList = new List<char>();
for (int i = char.MinValue; i <= char.MaxValue; i++)
{
char c = Convert.ToChar(i);
if (char.IsWhiteSpace(c))
{
whiteSpaceList.Add(c);
}
}
WhiteSpaces = whiteSpaceList.ToArray();
}

public static readonly char[] WhiteSpaces;
public static string[] SplitWhiteSpacesAndMore(this string str, IEnumerable<char> otherDeleimiters, StringSplitOptions options = StringSplitOptions.None)
{
var separatorList = new List<char>(WhiteSpaces);
separatorList.AddRange(otherDeleimiters);
return str.Split(separatorList.ToArray(), options);
}
}

Now you can use this extension method in this way:

string str = "word1 word2\tword3.word4,word5;word6";
char[] separator = { '.', ',', ';' };
string[] split = str.SplitWhiteSpacesAndMore(separator, StringSplitOptions.RemoveEmptyEntries);

How to split a string with more than one whitespaces as delimiters?

Split on two spaces, then trim any excess you might get in your results (would occur if you have an odd number of spaces)

List<string> splitStrings = myString.Split(new[]{"  "}, StringSplitOptions.RemoveEmptyEntries)
.Select(s => s.Trim())
.ToList();

Split string on multiple delimiters but not on white space alone

With regex module you could use:

import regex
arr = ['abc,, def', 'abc, def geh', 'abc def', 'abc, def , geh,, , ijk lmn \n opq']
res = [regex.split(r'\b(?=\W)(?! +\b)|(?<=\b\W*[^\w ]+\W*\b)',x) for x in arr]
print(res)

Prints:

[['abc', ',, ', 'def'], ['abc', ', ', 'def geh'], ['abc  def'], ['abc', ', ', 'def', ' , ', 'geh', ',, , ', 'ijk      lmn', ' \n ', 'opq']]

The pattern matches:

  • \b - Word boundary.
  • (?=\W) - A positive lookahead of a non-word character.
  • (?!\s\b) - Negative lookahead for a space character and a word boundary.
  • | - Or
  • (?<=\b\W*[^\w\s]+\W*\b) - A positive lookbehind for a word-boundary, zero or more non-word characters, at least one character other than wordcharacter or whitespace character and followed by (greedy) non-word characters if possible and a word-boundary.

JavaScript split String with white space

You could split the string on the whitespace and then re-add it, since you know its in between every one of the entries.

var string = "text to split";
string = string.split(" ");
var stringArray = new Array();
for(var i =0; i < string.length; i++){
stringArray.push(string[i]);
if(i != string.length-1){
stringArray.push(" ");
}
}

Update: Removed trailing space.



Related Topics



Leave a reply



Submit