In Python, How to Split a String and Keep the Separators

In Python, how do I split a string and keep the separators?

>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

Split String with multiple delimiters and keep delimiters

Try with parenthesis:

>>> split_str = re.split("(and | or | & | /)", input_str)
>>> split_str
['X < -500', ' & ', 'Y > 3000', ' /', ' Z > 50']
>>>

If you want to remove extra spaces:

>>> split_str = [i.strip() for i in re.split("(and | or | & | /)", input_str)]
>>> split_str
['X < -500', '&', 'Y > 3000', '/', ' Z > 50']
>>>

Python: Split string without losing split character

If you want to do this in a single line:


string = "HELLO.WORLD.AGAIN."
pattern = "."
result = string.replace(pattern, f" {pattern} ").split(" ")
# if you want to omit the last element because of the punctuation at the end of the string uncomment this
# result = result[:-1]

Python RE library String Split but keep the delimiters/separators as part of the next string

If you are using python 3.7+ you can split by zero-length matches using re.split and positive lookahead:

string = 'a+0b-2a+b-b'
re.split(r'(?=[+-])', string)

# ['a', '+0b', '-2a', '+b', '-b']

Demo: https://regex101.com/r/AB6UBa/1

How do I split a string and keep the separators using python re library?

You can use re.findall to capture each parenthesis group:

import re
string = r"('Option A' | 'Option B') & ('Option C' | 'Option D')"
pattern = r"(\([^\)]+\))"
re.findall(pattern, string)
# ["('Option A' | 'Option B')", "('Option C' | 'Option D')"]

This also works with re.split

re.split(pattern, string)
# ['', "('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')", '']

If you want to remove empty elements from using re.split you can:

[s for s in re.split(pattern, string) if s]
# ["('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')"]

How the pattern works:

  • ( begin capture group
  • \( matches the character ( literally
  • [^\)]+ Match between one and unlimited characters that are not )
  • \) matches the character ) literally
  • ) end capture group

PYTHON Split String at space but keep the spaces

I wonder why you need it, but it can be done like so

import re
a = 'bla bla bla bla'
temp = re.sub(' ','\t \t',a)
result = temp.split('\t')

Split a string by regex and keep the seperator AS A PART OF ITEMS in python

That happened because you used re.split that keeps the chunks captured in the resulting list as separate items.

Your regex makes sense only if your matches can span several lines, else, extracting any line that starts with a time-like pattern would be enough.

That is why I'd suggest

regex = r"\b\d+/\d+/\d.*?(?=\s*\b\d+/\d+/\d+|$)"
results = re.findall(regex, chat, re.S)

See the Python demo:

import re

chat = '''27/01/2019, 08:58 - Member 01 created group "Python Lovers ❤️"
27/01/2019, 08:58 - You were added
19/03/2019, 19:29 - Member 02: Hello guys,,,
19/03/2019, 19:29 - Member 03: Hi there..'''

regex = r"\b\d+/\d+/\d.*?(?=\s*\b\d+/\d+/\d+|$)"
results = re.findall(regex, chat, re.S)
for r in results:
print(r)

Output:

27/01/2019, 08:58 - Member 01 created group "Python Lovers ❤️"
27/01/2019, 08:58 - You were added
19/03/2019, 19:29 - Member 02: Hello guys,,,
19/03/2019, 19:29 - Member 03: Hi there..

Note the absence of the redundant capturing group and no * after the positive lookahead that made it optional. Whitespaces at the end of each match are stripped using \s* pattern inside the lookahead.

The re.S flag allows . to match any char including line break chars.

Split string into 2 columns, but keep the separator

you can add the () to keep the separators, for example:

df['column1'].str.split('(sep1|sep2|sep3)')

How would I split a python string and keep the separator, but the separator isn't a separate list item?

Use if condition to check if the length of a string is greater than 1 or not, and only concatenate when the length is greater than 1.

split = [_ + str(char) for _ in split if len(_)>0]


Related Topics



Leave a reply



Submit