Split String by Single Spaces

Split a string at a single space

I think this might be what you are looking for (you can improve it with \\s class for all whitespace like tabs, new lines and so on)

String data = "This_sentence__gets___split".replace('_', ' ');
// System.out.println(data);
String arr[] = data.split("(?<! ) |(?<= {2})");
for (String s : arr)
System.out.println("\"" + s + "\"");

Output:

"This"
"sentence"
" "
"gets"
" "
" "
"split."

Explanation:

  • "(?<! ) " will split only on spaces that don't have space before it
  • "(?<= {2})" will split in place that have two spaces before it.

Python split string exactly on one space. if double space make word not word

Just another regex solution: if you need to split with a single left-most whitespace char, use \s? to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars.

One very important step: run rstrip on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly.

import re
words = "this is a book and i like it"
print(re.findall(r'\s?(\s*\S+)', words.rstrip()))
# => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

See a Python demo. The re.findall returns just the captured substrings and since we only have one capturing group, the result is a list of those captures.

Also, here is a regex demo. Details:

  • \s? - 1 or 0 (due to ? quantifier) whitespaces
  • (\s*\S+) - Capturing group #1 matching

    • \s* - zero or more (due to the * quantifier) whitespace
    • \S+ - 1 or more (due to + quantifier) non-whitespace symbols.

How to split a String by space

What you have should work. If, however, the spaces provided are defaulting to... something else? You can use the whitespace regex:

str = "Hello I'm your String";
String[] splited = str.split("\\s+");

This will cause any number of consecutive spaces to split your string into tokens.

Split string by single spaces

You can even develop your own split function (I know, little old-fashioned):

size_t split(const std::string &txt, std::vector<std::string> &strs, char ch)
{
size_t pos = txt.find( ch );
size_t initialPos = 0;
strs.clear();

// Decompose statement
while( pos != std::string::npos ) {
strs.push_back( txt.substr( initialPos, pos - initialPos ) );
initialPos = pos + 1;

pos = txt.find( ch, initialPos );
}

// Add the last one
strs.push_back( txt.substr( initialPos, std::min( pos, txt.size() ) - initialPos + 1 ) );

return strs.size();
}

Then you just need to invoke it with a vector<string> as argument:

int main()
{
std::vector<std::string> v;

split( "This is a test", v, ' ' );
dump( cout, v );

return 0;
}

Find the code for splitting a string in IDEone.

Hope this helps.

Split string into a list on whitespace, excluding single spaces when the next character is not a dash

A less haphazard approach would be to interpret the headers on the first line as column indicators, and split on those widths.

import sys
import re

def col_widths(s):
# Shamelessly adapted from https://stackoverflow.com/a/33090071/874188
cols = re.findall(r'\S+\s+', s)
return [len(col) for col in cols]

widths = col_widths(next(sys.stdin))

for line in sys.stdin:
line = line.rstrip('\n')
fields = []
for col_max in widths[:-1]:
fields.append(line[0:col_max].strip())
line = line[col_max:]
fields.append(line)
print(fields)

Demo: https://ideone.com/ASANjn

This seems to provide a better interpretation of e,g. the LDate column, where the dates are sometimes padded with more than one space. The penultimate column preserves the final dash as part of the column value; this seems more consistent with the apparent intent of the author of the original table, though perhaps separately split that off from that specific column if that's not to your liking.

If you don't want to read sys.stdin, just wrap this in with open(filename) as handle: and replace sys.stdin with handle everywhere.

Python - string.split() but ignore single spaces (e.g. between words)

This is the sort of problem where regular expressions excel. So let's construct a regex to find all the spaces, that have more than one space character. \s matches spaces, so let's go with that:

\s

And to match N-or-more than something in regex, you put a {N,} after the expression. So, let's put {2,} in to match for 2-or-more:

\s{2,}

Now that we have our regular expression, we need a regular expression parser. Python comes with one built in. Python's regex module also comes with a function that will split every time the regular expression pings on a match. So, we do:

import re # This is the built-in regex module
string = "ABC DEF GHI JK LMNO P"
my_list = re.split("\s{2,}", string)

Unrelated to this question, note how I changed your variable from list to my_list. This is because list is a built-in keyword in Python, that you don't want to over-write.

Splitting a string on space except for single space

Like this:

myString.split("\\s{2,}");

or like this,

myString.split(" \\s+"); // notice the blank at the beginning.

It depends on what you really want, which is not clear by reading the question.

You can check the quantifier syntax in the Pattern class.

Split string on multiple white space not on single space?

Here is an example of what I was talking about.

  • You need to ensure that your tabs are converted to spaces while preserving the column locations.
  • Because tabs and spaces are intermixed, the easiest solution is to eye the column starts and manually enter them into an array. If you make a guide as shown below this is trivial to do.
  • Then just read in the lines and split them using the column locations.
  • The "data" is followed by the column number, or if grouped in the same column, the number and a letter.
String[] data = {
// 1111111111222222222233333333334444444444555555555566666666667777777777
// 01234567890123456789012345678901234567890123456789012345678901234567890123456789
"Data1 Data2 Data3 Data4 Data5a Data5b Data5c Data6 Data7 Data8",
"Data1 Data2 Data3 Data4 Data5a Data5b Data5c Data6 Data7 ",
"Data1 Data2 Data3 Data6 Data7 Data8",
"Data1 Data2 Data4 Data5a Data5b Data5c Data6 Data7 Data8",
};

// last entry is string length of the line
int[] columnStarts = { 0, 7, 18, 26, 34, 58, 64, 74, 79};
for (String line : data) {
int columnNumber = 0;
for (int i = 0; i < columnStarts.length - 1; i++) {
System.out.printf("%3d : %3d -- '%s'%n",
(columnNumber + 1),
columnStarts[columnNumber],
line.substring(columnStarts[i],
columnStarts[i + 1]).trim());
columnNumber++;
}
System.out.println();
}

Prints

  1 :   0  -- 'Data1'
2 : 7 -- 'Data2'
3 : 18 -- 'Data3'
4 : 26 -- 'Data4'
5 : 34 -- 'Data5a Data5b Data5c'
6 : 58 -- 'Data6'
7 : 64 -- 'Data7'
8 : 74 -- 'Data8'

1 : 0 -- 'Data1'
2 : 7 -- 'Data2'
3 : 18 -- 'Data3'
4 : 26 -- 'Data4'
5 : 34 -- 'Data5a Data5b Data5c'
6 : 58 -- 'Data6'
7 : 64 -- 'Data7'
8 : 74 -- ''

1 : 0 -- 'Data1'
2 : 7 -- 'Data2'
3 : 18 -- 'Data3'
4 : 26 -- ''
5 : 34 -- ''
6 : 58 -- 'Data6'
7 : 64 -- 'Data7'
8 : 74 -- 'Data8'

1 : 0 -- 'Data1'
2 : 7 -- 'Data2'
3 : 18 -- ''
4 : 26 -- 'Data4'
5 : 34 -- 'Data5a Data5b Data5c'
6 : 58 -- 'Data6'
7 : 64 -- 'Data7'
8 : 74 -- 'Data8'

Note that the Data is trimmed and printed to show just the data portion of the column. Without the white space trimming, the data would show trailing white space for each column.

The above should be enough for you to store the information in an array or list and modify it based on column number.

Splitting a string with multiple spaces

Since the argument to split() is a regular expression, you can look for one or more spaces (" +") instead of just one space (" ").

String[] array = s.split(" +");

Split a string only by first space in python

Just pass the count as second parameter to str.split function.

>>> s = "238 NEO Sports"
>>> s.split(" ", 1)
['238', 'NEO Sports']


Related Topics



Leave a reply



Submit