Split String With Multiple Delimiters in Python

Split string with multiple delimiters in Python

Luckily, Python has this built-in :)

import re
re.split('; |, ', string_to_split)

Update:
Following your comment:

>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']

Split Strings into words with multiple word boundary delimiters

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

Split String with multiple delimiters and keep delimiters

Try with parenthesis:

>>> split_str = re.split("(and | or | & | /)", input_str)
>>> split_str
['X < -500', ' & ', 'Y > 3000', ' /', ' Z > 50']
>>>

If you want to remove extra spaces:

>>> split_str = [i.strip() for i in re.split("(and | or | & | /)", input_str)]
>>> split_str
['X < -500', '&', 'Y > 3000', '/', ' Z > 50']
>>>

python split string by multiple delimiters and/or combination of multiple delimiters

Combining @Johnny Mopp's and @alfinkel24's comments:

re.split("[\s,]+",  x)

Will split the string as required to

['121', '1238', 'xyz', '123abc', 'abc123']

Explanation:

  • [...] any of the characters.
  • + one or more repetitions of the previous characters.
  • \s any white space characters including "\n, \r, \t"



    Official documentation:

\s

For Unicode (str) patterns:
Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched.

For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set; this is equivalent to [ \t\n\r\f\v].

Python split string by multiple delimiters following a hierarchy

Try:

import re

tests = [
["121 34 adsfd", ["121 34 adsfd"]],
["dsfsd and adfd", ["dsfsd ", " adfd"]],
["dsfsd & adfd", ["dsfsd ", " adfd"]],
["dsfsd - adfd", ["dsfsd ", " adfd"]],
["dsfsd and adfd and adsfa", ["dsfsd ", " adfd and adsfa"]],
["dsfsd and adfd - adsfa", ["dsfsd ", " adfd - adsfa"]],
["dsfsd - adfd and adsfa", ["dsfsd - adfd ", " adsfa"]],
]

for s, result in tests:
res = re.split(r"and|&(?!.*and)|-(?!.*and|.*&)", s, maxsplit=1)
print(res)
assert res == result

Prints:

['121 34 adsfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd']
['dsfsd ', ' adfd and adsfa']
['dsfsd ', ' adfd - adsfa']
['dsfsd - adfd ', ' adsfa']

Explanation:

The regex and|&(?!.*and)|-(?!.*and|.*&) uses 3 alternatives.

  1. We match and always or:
  2. We match & only if there isn't and ahead (using the negative look-ahead (?! ) or:
  3. We match - only if there isn't and or & ahead.

We're using this pattern in re.sub -> splitting only on the first match.

Splitting strings using multiple delimiters- in Python. Getting TypeError: expected string or bytes-like object

re is a library that recieves a String type, not a Pandas dataframe column you should use an accessor in this case

df[['A']] = df['Sport'].str.split(r';,')

I hope it resolves your problem

How to split string with multiple delimiters in Python?

Probably you got the answer, but if you want a generic method for any string data you can do this:

In this way you wont be restricted to one string and you can loop the data as well.

csv = "xxx.xxx.com-bonding_err_bond0-if_eth2-d.rrd.csv"

first_index = csv.find("-")
second_index = csv.find("-d")

result = csv[first_index+1:second_index]
print(result)
# OUTPUT:
# bonding_err_bond0-if_eth2

Split string by multiple delimiters, ignore repeating delimiters

Use re.findall:

re.findall(r'[^-,]+', string)

See proof

Python code:

import re
regex = r"[^,-]+"
string = "-abc,-def,ghi-jkl,mno"
print(re.findall(regex, string))

Result: ['abc', 'def', 'ghi', 'jkl', 'mno']

How Do I Split A String Using Multiple Delimiters (Python)

re.split can split a string on every match for your regex

>>> re.split('[/\.]', 'https://expressjs.com/en/starter/hello-world.html')
['https:', '', 'expressjs', 'com', 'en', 'starter', 'hello-world', 'html']

[/\.] matches any forward-slash or period character



Related Topics



Leave a reply



Submit