Split string with multiple delimiters in Python
Luckily, Python has this built-in :)
import re
re.split('; |, ', string_to_split)
Update:
Following your comment:
>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']
Split Strings into words with multiple word boundary delimiters
A case where regular expressions are justified:
import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']
Split String with multiple delimiters and keep delimiters
Try with parenthesis:
>>> split_str = re.split("(and | or | & | /)", input_str)
>>> split_str
['X < -500', ' & ', 'Y > 3000', ' /', ' Z > 50']
>>>
If you want to remove extra spaces:
>>> split_str = [i.strip() for i in re.split("(and | or | & | /)", input_str)]
>>> split_str
['X < -500', '&', 'Y > 3000', '/', ' Z > 50']
>>>
python split string by multiple delimiters and/or combination of multiple delimiters
Combining @Johnny Mopp's and @alfinkel24's comments:
re.split("[\s,]+", x)
Will split the string as required to
['121', '1238', 'xyz', '123abc', 'abc123']
Explanation:
[...]
any of the characters.+
one or more repetitions of the previous characters.\s
any white space characters including"\n, \r, \t"
Official documentation:
\s
For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched.
For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set; this is equivalent to [ \t\n\r\f\v].
Python split string by multiple delimiters following a hierarchy
Try:
import re
tests = [
["121 34 adsfd", ["121 34 adsfd"]],
["dsfsd and adfd", ["dsfsd ", " adfd"]],
["dsfsd & adfd", ["dsfsd ", " adfd"]],
["dsfsd - adfd", ["dsfsd ", " adfd"]],
["dsfsd and adfd and adsfa", ["dsfsd ", " adfd and adsfa"]],
["dsfsd and adfd - adsfa", ["dsfsd ", " adfd - adsfa"]],
["dsfsd - adfd and adsfa", ["dsfsd - adfd ", " adsfa"]],
]
for s, result in tests:
res = re.split(r"and|&(?!.*and)|-(?!.*and|.*&)", s, maxsplit=1)
print(res)
assert res == result
Prints:
['121 34 adsfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd and adsfa']
['dsfsd ', ' adfd - adsfa']
['dsfsd - adfd ', ' adsfa']
Explanation:
The regex and|&(?!.*and)|-(?!.*and|.*&)
uses 3 alternatives.
- We match
and
always or: - We match
&
only if there isn'tand
ahead (using the negative look-ahead(?! )
or: - We match
-
only if there isn'tand
or&
ahead.
We're using this pattern in re.sub
-> splitting only on the first match.
Splitting strings using multiple delimiters- in Python. Getting TypeError: expected string or bytes-like object
re is a library that recieves a String type, not a Pandas dataframe column you should use an accessor in this case
df[['A']] = df['Sport'].str.split(r';,')
I hope it resolves your problem
How to split string with multiple delimiters in Python?
Probably you got the answer, but if you want a generic method for any string data you can do this:
In this way you wont be restricted to one string and you can loop the data as well.
csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"
first_index = csv.find("-")
second_index = csv.find("-d")
result = csv[first_index+1:second_index]
print(result)
# OUTPUT:
# bonding_err_bond0-if_eth2
Split string by multiple delimiters, ignore repeating delimiters
Use re.findall
:
re.findall(r'[^-,]+', string)
See proof
Python code:
import re
regex = r"[^,-]+"
string = "-abc,-def,ghi-jkl,mno"
print(re.findall(regex, string))
Result: ['abc', 'def', 'ghi', 'jkl', 'mno']
How Do I Split A String Using Multiple Delimiters (Python)
re.split
can split a string on every match for your regex
>>> re.split('[/\.]', 'https://expressjs.com/en/starter/hello-world.html')
['https:', '', 'expressjs', 'com', 'en', 'starter', 'hello-world', 'html']
[/\.]
matches any forward-slash or period character
Related Topics
What Does the Ellipsis Object Do
How to Define a Two-Dimensional Array
How to Print a Date in a Regular Format
"Ask Forgiveness Not Permission" - Explain
Python Multiprocessing Picklingerror: Can't Pickle ≪Type 'Function'≫
Reading Binary File and Looping Over Each Byte
How to Compare Two Lists in Python and Return Matches
Text Progress Bar in Terminal With Block Characters
Get Unique Values from a List in Python
Split Pandas Dataframe Based on Groupby
How to Connect to a MySQL Database in Python
Saving an Object (Data Persistence)
Why Is Python Running My Module When I Import It, and How to Stop It
Count the Frequency That a Value Occurs in a Dataframe Column