Split a string into words by multiple delimiters
Assuming one of the delimiters is newline, the following reads the line and further splits it by the delimiters. For this example I've chosen the delimiters space, apostrophe, and semi-colon.
std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line))
{
std::size_t prev = 0, pos;
while ((pos = line.find_first_of(" ';", prev)) != std::string::npos)
{
if (pos > prev)
wordVector.push_back(line.substr(prev, pos-prev));
prev = pos+1;
}
if (prev < line.length())
wordVector.push_back(line.substr(prev, std::string::npos));
}
Split Strings into words with multiple word boundary delimiters
A case where regular expressions are justified:
import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']
Split string with multiple delimiters in Python
Luckily, Python has this built-in :)
import re
re.split('; |, ', string_to_split)
Update:
Following your comment:
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
Split String with multiple delimiters and keep delimiters
Try with parenthesis:
>>> split_str = re.split("(and | or | & | /)", input_str)
>>> split_str
['X < -500', ' & ', 'Y > 3000', ' /', ' Z > 50']
>>>
If you want to remove extra spaces:
>>> split_str = [i.strip() for i in re.split("(and | or | & | /)", input_str)]
>>> split_str
['X < -500', '&', 'Y > 3000', '/', ' Z > 50']
>>>
Python split string by multiple delimiters following a hierarchy
Try:
import re
tests = [
["121 34 adsfd", ["121 34 adsfd"]],
["dsfsd and adfd", ["dsfsd ", " adfd"]],
["dsfsd & adfd", ["dsfsd ", " adfd"]],
["dsfsd - adfd", ["dsfsd ", " adfd"]],
["dsfsd and adfd and adsfa", ["dsfsd ", " adfd and adsfa"]],
["dsfsd and adfd - adsfa", ["dsfsd ", " adfd - adsfa"]],
["dsfsd - adfd and adsfa", ["dsfsd - adfd ", " adsfa"]],
]
for s, result in tests:
res = re.split(r"and|&(?!.*and)|-(?!.*and|.*&)", s, maxsplit=1)
print(res)
assert res == result
Prints:
['121 34 adsfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd and adsfa']
['dsfsd ', ' adfd - adsfa']
['dsfsd - adfd ', ' adsfa']
Explanation:
The regex and|&(?!.*and)|-(?!.*and|.*&)
uses 3 alternatives.
- We match
and
always or: - We match
&
only if there isn'tand
ahead (using the negative look-ahead(?! )
or: - We match
-
only if there isn'tand
or&
ahead.
We're using this pattern in re.sub
-> splitting only on the first match.
Use String.split() with multiple delimiters
I think you need to include the regex OR operator:
String[]tokens = pdfName.split("-|\\.");
What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] -
or .
python split string by multiple delimiters and/or combination of multiple delimiters
Combining @Johnny Mopp's and @alfinkel24's comments:
re.split("[\s,]+", x)
Will split the string as required to
['121', '1238', 'xyz', '123abc', 'abc123']
Explanation:
[...]
any of the characters.+
one or more repetitions of the previous characters.\s
any white space characters including"\n, \r, \t"
Official documentation:
\s
For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched.
For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set; this is equivalent to [ \t\n\r\f\v].
Related Topics
Constexpr If and Static_Assert
Removing Leading and Trailing Spaces from a String
How to Do an Integer Log2() in C++
Why Should I Not Try to Use "This" Value After "Delete This"
How to Convert String to Char Array in C++
Determine If a Type Is an Stl Container At Compile Time
Why Switch/Case and Not If/Else If
Global Variable Within Multiple Files
Functions With Const Arguments and Overloading
What Does a Colon Following a C++ Constructor Name Do
C++ Preprocessor #Define-Ing a Keyword. Is It Standards Conforming
Sfinae to Check For Inherited Member Functions