How to Split a String into a [String] and Not [Substring]

How do you split a String into a [String] and not [Substring]?

As Leo said above, you can use components(separatedBy:)

let string = "This|is|a|test"
let words = string.components(separatedBy: "|")
words.foo()

instead, that returns a [String].

If you want to stick with
split() (e.g. because it has more options, such as to omit
empty subsequences), then you'll have to create a new array by converting each Substring
to a String:

let string = "This|is|a|test"
let words = string.split(separator: "|").map(String.init)
words.foo()

Alternatively – if possible – make the array extension method more
general to take arguments conforming to the StringProtocol protocol,
that covers both String and Substring:

extension Array where Element: StringProtocol {
func foo(){
}
}

How do I split a string in Java?

Use the appropriately named method String#split().

String string = "004-034556";
String[] parts = string.split("-");
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556

Note that split's argument is assumed to be a regular expression, so remember to escape special characters if necessary.

there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".

For instance, to split on a period/dot . (which means "any character" in regex), use either backslash \ to escape the individual special character like so split("\\."), or use character class [] to represent literal character(s) like so split("[.]"), or use Pattern#quote() to escape the entire string like so split(Pattern.quote(".")).

String[] parts = string.split(Pattern.quote(".")); // Split on the exact string.

To test beforehand if the string contains certain character(s), just use String#contains().

if (string.contains("-")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain -");
}

Note, this does not take a regular expression. For that, use String#matches() instead.

If you'd like to retain the split character in the resulting parts, then make use of positive lookaround. In case you want to have the split character to end up in left hand side, use positive lookbehind by prefixing ?<= group on the pattern.

String string = "004-034556";
String[] parts = string.split("(?<=-)");
String part1 = parts[0]; // 004-
String part2 = parts[1]; // 034556

In case you want to have the split character to end up in right hand side, use positive lookahead by prefixing ?= group on the pattern.

String string = "004-034556";
String[] parts = string.split("(?=-)");
String part1 = parts[0]; // 004
String part2 = parts[1]; // -034556

If you'd like to limit the number of resulting parts, then you can supply the desired number as 2nd argument of split() method.

String string = "004-034556-42";
String[] parts = string.split("-", 2);
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556-42

Split string on delimiter while ignoring substrings

I would recommend regex's re.split here, it's a lot more flexible to patterns.

>>> import re
>>> re.split(r',\s*(?!.*?])', string)
["'test_1[off,on]hello'", '200', '300']

Details

,       # comma
\s* # whitespace (any number of chars)
(?! # negative lookahead
.*? # anything
] # closing brace
)

The pattern will not split on a comma that is located inside [...].

How to split a string by a string except when the string is in quotes in python?

You can use the following regex with re.findall:

((?:(?!\band\b)[^'])*(?:'[^'\\]*(?:\\.[^'\\]*)*'(?:(?!\band\b)[^'])*)*)(?:and|$)

See the regex demo.

The regular expression consists of an unwrapped sequences of either anything but a ' up to the first and (with the tempered greedy token (?:(?!\band\b)[^'])*) and anything (supporting escaped entities) between and including single apostrophes (with '[^'\\]*(?:\\.[^'\\]*)*' - which is also an unwrapped version of ([^'\\]|\\.)*).

Python code demo:

import re
p = re.compile(r'((?:(?!\band\b)[^\'])*(?:\'[^\'\\]*(?:\\.[^\'\\]*)*\'(?:(?!\band\b)[^\'])*)*)(?:and|$)')
s = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"
print([x for x in p.findall(s) if x])

Split string into two parts only

You could use a,b = split(' ', 1).

The second argument 1 is the maximum number of splits that would be done.

s = 'abcd efgh hijk'
a,b = s.split(' ', 1)
print(a) #abcd
print(b) #efgh hijk

For more information on the string split function, see str.split in the manual.

Regex to split string except specified substring

You should match these strings, here is a possible solution:

String input = "apple \"banana\"";
Pattern p = Pattern.compile("\"([^\"]*)\"|\\S+");
Matcher m = p.matcher(input);
List<String> results = new ArrayList<>(); // Declare a list
while (m.find()) {
if (m.group(1) != null) {
results.add(m.group(1)); // Add Group 1 to the list
} else {
results.add(m.group()); // Add the whole match value to the list
}
}
System.out.println(results); // Prints the resulting list

NOTE: If you plan to match all chars between the first matched " till the last ", you may use Pattern p = Pattern.compile("\"(.*)\"|\\S+");.

See the Java demo online.

Output:

apple
banana

The "([^"]*)"|\S+ pattern matches:

  • " - a " char
  • ([^"]*) - Group 1: any 0 or more chars other than "
  • " - a " char
  • | - or
  • \S+ - 1+ non-whitespace chars.

Splitting strings in Python without split()

sentence = 'This is a sentence'
split_value = []
tmp = ''
for c in sentence:
if c == ' ':
split_value.append(tmp)
tmp = ''
else:
tmp += c
if tmp:
split_value.append(tmp)

Split a string at every 4-th character?

This ought to do it:

String[] split = myString.split("(?=(....)+$)");
// or
String[] split = myString.split("(?=(.{4})+$)");

What it does is this: split on the empty string only if that empty string has a multiple of 4 chars ahead of it until the end-of-input is reached.

Of course, this has a bad runtime (O(n^2)). You can get a linear running time algorithm by simply splitting it yourself.

As mentioned by @anubhava:

(?!^)(?=(?:.{4})+$) to avoid empty results if string length is in multiples of 4

Split a string by a substring except for brackets

Even if you have nested balanced parentheses, you can use

\s*\band\b\s* # whole word and enclosed with 0+ whitespaces
(?= # start of a positive lookahead:
(?:
[^()]* # 0 or more chars other than ( and )
\((?>[^()]+|(?<o>\()|(?<-o>\)))*(?(o)(?!))\) # a (...) substring with nested parens support
)* # repeat the sequence of above two patterns 0 or more times
[^()]*$ # 0 or more chars other than ( and ) and end of string
) # end of the positive lookahead

See the regex demo.

See a C# snippet:

var text = "a > b and b = 0 and (f = 1 and (g = 2 and j = 68) and v = 566) and a > b and b = 0 and (f = 1 and g = 2)";
var pattern = @"(?x)
var pattern = @"(?x)
\s*\band\b\s* # whole word and enclosed with 0+ whitespaces
(?= # start of a positive lookahead:
(?:
[^()]* # 0 or more chars other than ( and )
\((?>[^()]+|(?<o>\()|(?<-o>\)))*(?(o)(?!))\) # a (...) substring with nested parens support
)* # repeat the sequence of above two patterns 0 or more times
[^()]*$ # 0 or more chars other than ( and ) and end of string
) # end of the positive lookahead";
var results = Regex.Split(text, pattern);

Output:

a > b
b = 0
(f = 1 and (g = 2 and j = 68) and v = 566)
a > b
b = 0
(f = 1 and g = 2)

How do I reliably split a string in Python, when it may not contain the pattern, or all n elements?

If you're splitting into just two parts (like in your example) you can use str.partition() to get a guaranteed argument unpacking size of 3:

>>> a, sep, b = 'foo'.partition(':')
>>> a, sep, b
('foo', '', '')

str.partition() always returns a 3-tuple, whether the separator is found or not.

Another alternative for Python 3.x is to use extended iterable unpacking:

>>> a, *b = 'foo'.split(':')
>>> a, b
('foo', [])

This assigns the first split item to a and the list of remaining items (if any) to b.



Related Topics



Leave a reply



Submit