Split a String by a Delimiter in Python

Split a string by a delimiter in python

You can use the str.split method: string.split('__')

>>> "MATCHES__STRING".split("__")
['MATCHES', 'STRING']

Split string at delimiter '\' in python

You need to escape the backslash:

 S.split('\\')

You may also need to string_escape:

In [10]: s = 'greenland.gdb\topology_check\t_buildings'

In [11]: s.split("\\")
Out[11]: ['greenland.gdb\topology_check\t_buildings']

In [12]: s.encode("string_escape").split("\\")
Out[12]: ['greenland.gdb', 'topology_check', 't_buildings']

\t would be interpreted as a tab character unless you were using a raw string:

In [18]: s = 'greenland.gdb\topology_check\t_buildings'

In [19]: print(s)
greenland.gdb opology_check _buildings

In [20]: s = r'greenland.gdb\topology_check\t_buildings'

In [21]: print(s)
greenland.gdb\topology_check\t_buildings

Escape characters

How to split a string with many delimiter in python?

For performance, you should use regex as per the marked duplicate. See benchmarking below.

groupby + str.isalnum

You can use itertools.groupby with str.isalnum to group by characters which are alphanumeric.

With this solution you do not have to worry about splitting by explicitly specified characters.

from itertools import groupby

x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."

res = [''.join(j) for i, j in groupby(x, key=str.isalnum) if i]

print(res)

['has', '15', 'science', 'and', 'engineering', 'departments',
'affiliated', 'centers', 'Bandar', 'Abbas', 'and', 'Mahshahr']

Benchmarking vs regex

Some performance benchmarking versus regex solutions (tested on Python 3.6.5):

from itertools import groupby
import re

x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."

z = x*10000
%timeit [''.join(j) for i, j in groupby(z, key=str.isalnum) if i] # 184 ms
%timeit list(filter(None, re.sub(r'\W+', ',', z).split(','))) # 82.1 ms
%timeit list(filter(None, re.split('\W+', z))) # 63.6 ms
%timeit [_ for _ in re.split(r'\W', z) if _] # 62.9 ms

Python split string with delimiter

One way with regex:

import re 

def findUrlFromString(string):

regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
url = re.findall(regex,string)
return [x[0] for x in url]

string = """
- https://site1 # site1
- https://site2 # site2
- https://site3 # site3
- https://site4 # ssite4
"""
print(findUrlFromString(string))

WORKING DEMO: https://rextester.com/LEHDE94008

Another way with list comprehension,

list_of_urls = ['-https://site1#site1', '-https://site2#site2', '-https://site3#site3', '-https://site4#site4']
result = [i.split('#')[0].lstrip('-') for i in list_of_urls]
print(result)

WORKING DEMO: https://rextester.com/VNW41814

Splitting a python string at a delimiter but a specific one

how about something like this:

s = "The cat jumped over the moon very quickly"

l = s.split()

s1 = ' '.join(l[:len(l)//2])
s2 = ' '.join(l[len(l)//2 :])

print(s1)
print(s2)

Split string using a newline delimiter with Python

str.splitlines method should give you exactly that.

>>> data = """a,b,c
... d,e,f
... g,h,i
... j,k,l"""
>>> data.splitlines()
['a,b,c', 'd,e,f', 'g,h,i', 'j,k,l']

Split Strings into words with multiple word boundary delimiters

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

Splitting on last delimiter in Python string?

Use .rsplit() or .rpartition() instead:

s.rsplit(',', 1)
s.rpartition(',')

str.rsplit() lets you specify how many times to split, while str.rpartition() only splits once but always returns a fixed number of elements (prefix, delimiter & postfix) and is faster for the single split case.

Demo:

>>> s = "a,b,c,d"
>>> s.rsplit(',', 1)
['a,b,c', 'd']
>>> s.rsplit(',', 2)
['a,b', 'c', 'd']
>>> s.rpartition(',')
('a,b,c', ',', 'd')

Both methods start splitting from the right-hand-side of the string; by giving str.rsplit() a maximum as the second argument, you get to split just the right-hand-most occurrences.

If you only need the last element, but there is a chance that the delimiter is not present in the input string or is the very last character in the input, use the following expressions:

# last element, or the original if no `,` is present or is the last character
s.rsplit(',', 1)[-1] or s
s.rpartition(',')[-1] or s

If you need the delimiter gone even when it is the last character, I'd use:

def last(string, delimiter):
"""Return the last element from string, after the delimiter

If string ends in the delimiter or the delimiter is absent,
returns the original string without the delimiter.

"""
prefix, delim, last = string.rpartition(delimiter)
return last if (delim and last) else prefix

This uses the fact that string.rpartition() returns the delimiter as the second argument only if it was present, and an empty string otherwise.

Split string with multiple delimiters in Python

Luckily, Python has this built-in :)

import re
re.split('; |, ', string_to_split)

Update:
Following your comment:

>>> a='Beautiful, is; better*than\nugly'
>>> import re
>>> re.split('; |, |\*|\n',a)
['Beautiful', 'is', 'better', 'than', 'ugly']


Related Topics



Leave a reply



Submit