Split Strings into Words With Multiple Word Boundary Delimiters

Split Strings into words with multiple word boundary delimiters

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

Split string with multiple delimiters in Python

Luckily, Python has this built-in :)

import re
re.split('; |, ', string_to_split)

Update:
Following your comment:

>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']

How to split a string with multiple delimiters using string.split()?

This code takes each string element of the list and replaces at with | and then it splits by | and then assigns in-place the sub-list of the resulting strings.

Side-note: Don't use list as a variable name, since it is a language built-in keyword.

lis = ['Sep 10, 2020 at 17:36 | Kate', 'Sep 10, 2020 at 17:13 | Charles']
lis = [string.replace(" at ", " | ").split(" | ") for string in lis]
print(lis)

Output:

[['Sep 10, 2020', '17:36', 'Kate'], ['Sep 10, 2020', '17:13', 'Charles']]

Split String with multiple delimiters and keep delimiters

Try with parenthesis:

>>> split_str = re.split("(and | or | & | /)", input_str)
>>> split_str
['X < -500', ' & ', 'Y > 3000', ' /', ' Z > 50']
>>>

If you want to remove extra spaces:

>>> split_str = [i.strip() for i in re.split("(and | or | & | /)", input_str)]
>>> split_str
['X < -500', '&', 'Y > 3000', '/', ' Z > 50']
>>>

Splitting strings using multiple delimiters- in Python. Getting TypeError: expected string or bytes-like object

re is a library that recieves a String type, not a Pandas dataframe column you should use an accessor in this case

df[['A']] = df['Sport'].str.split(r';,')

I hope it resolves your problem

Split Strings into words with multiple word boundary delimiters

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

How to split a string by multiple punctuations with Python?

You can use regex to achieve this as:

>>> import re
>>> s = 'a,b,c d!e.f\ngood\tmorning&night'

>>> re.split('[?.,\n\t&! ]', s)
['a', 'b', 'c', 'd', 'e', 'f', 'good', 'morning', 'night']

If you are looking for a solution using split(), then here's a workaround:

>>> identifiers = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n\t '

>>> "".join((' ' if c in identifiers else c for c in s)).split()
['a', 'b', 'c', 'd', 'e', 'f', 'good', 'morning', 'night']

Here, I am replacing all the identifiers with a space " " in the string, and then splitting the string based on the space.

How do I split a string with multiple word delimiters in Python?

You could use re like,

Updated using the better way suggested by @pault using word boundaries \b instead of :space:,

>>> import re
>>> words = ['hello world', 'hello my name is jolloopp', 'my jolloopp name is hello']

# Iterate over the list of words and then use the `re` to split the strings,
>>> [z for y in (re.split('|'.join(r'\b{}\b'.format(x) for x in splitters), word) for word in words) for z in y]
['hello world', 'hello ', ' name ', ' jolloopp', '', ' jolloopp name ', ' hello']


Related Topics



Leave a reply



Submit