How to Split and Parse a String in Python

How can I split and parse a string in Python?

"2.7.0_bf4fda703454".split("_") gives a list of strings:

In [1]: "2.7.0_bf4fda703454".split("_")
Out[1]: ['2.7.0', 'bf4fda703454']

This splits the string at every underscore. If you want it to stop after the first split, use "2.7.0_bf4fda703454".split("_", 1).

If you know for a fact that the string contains an underscore, you can even unpack the LHS and RHS into separate variables:

In [8]: lhs, rhs = "2.7.0_bf4fda703454".split("_", 1)

In [9]: lhs
Out[9]: '2.7.0'

In [10]: rhs
Out[10]: 'bf4fda703454'

An alternative is to use partition(). The usage is similar to the last example, except that it returns three components instead of two. The principal advantage is that this method doesn't fail if the string doesn't contain the separator.

Python split string and add characters wherever parse occurs

Try this:

  1. First replace 'B' with 'x-x'

  2. Split the string with separator '-'.

     string = 'AAABA'
    string = string.replace('B', 'x-x')
    print( string.split('-') )

    OUT: ['AAAx', 'xA']`
string = 'AABBA'
string = string.replace('B', 'x-x')
print( string.split('-') )

out: ['AAx', 'xx', 'xA']

Parse string in python

As far as I can understand, this is consistent with what you want, and is pretty simple. It just uses some slicing to isolate the first word and the part between parentheses. It also has to use strip a couple of times due to the extra spaces. It may seem a little verbose, but to be honest if the task can be accomplished with such simple string operations I feel like complicated parsing is unnecessary (although I may have gotten it wrong). Note that this is flexible in the amount of whitespace to split by.

mystr = '  foo1   (foo2 foo3 (foo4))' 
mystr = mystr.strip()
i = mystr.index(' ')
a = mystr[:i].strip()
b = mystr[i:].strip()[1:-1]
print([a, b])

with output

['foo1', 'foo2 foo3 (foo4)']

Although I'm still not entirely clear if this is what you want. Let me know if it works or what needs changing.

Efficient way to convert strings from split function to ints in Python

My original suggestion with a list comprehension.

test = '8743-12083-15'
lst_int = [int(x) for x in test.split("-")]

EDIT:

As to which is most efficient (cpu-cyclewise) is something that should always be tested.
Some quick testing on my Python 2.6 install indicates map is probably the most efficient candidate here (building a list of integers from a value-splitted string). Note that the difference is so small that this does not really matter until you are doing this millions of times (and it is a proven bottleneck)...

def v1():
return [int(x) for x in '8743-12083-15'.split('-')]

def v2():
return map(int, '8743-12083-15'.split('-'))

import timeit
print "v1", timeit.Timer('v1()', 'from __main__ import v1').timeit(500000)
print "v2", timeit.Timer('v2()', 'from __main__ import v2').timeit(500000)

> output v1 3.73336911201
> output v2 3.44717001915

Split string into strings by length?

>>> x = "qwertyui"
>>> chunks, chunk_size = len(x), len(x)//4
>>> [ x[i:i+chunk_size] for i in range(0, chunks, chunk_size) ]
['qw', 'er', 'ty', 'ui']

Split string at delimiter '\' in python

You need to escape the backslash:

 S.split('\\')

You may also need to string_escape:

In [10]: s = 'greenland.gdb\topology_check\t_buildings'

In [11]: s.split("\\")
Out[11]: ['greenland.gdb\topology_check\t_buildings']

In [12]: s.encode("string_escape").split("\\")
Out[12]: ['greenland.gdb', 'topology_check', 't_buildings']

\t would be interpreted as a tab character unless you were using a raw string:

In [18]: s = 'greenland.gdb\topology_check\t_buildings'

In [19]: print(s)
greenland.gdb opology_check _buildings

In [20]: s = r'greenland.gdb\topology_check\t_buildings'

In [21]: print(s)
greenland.gdb\topology_check\t_buildings

Escape characters

How to split a string with many delimiter in python?

For performance, you should use regex as per the marked duplicate. See benchmarking below.

groupby + str.isalnum

You can use itertools.groupby with str.isalnum to group by characters which are alphanumeric.

With this solution you do not have to worry about splitting by explicitly specified characters.

from itertools import groupby

x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."

res = [''.join(j) for i, j in groupby(x, key=str.isalnum) if i]

print(res)

['has', '15', 'science', 'and', 'engineering', 'departments',
'affiliated', 'centers', 'Bandar', 'Abbas', 'and', 'Mahshahr']

Benchmarking vs regex

Some performance benchmarking versus regex solutions (tested on Python 3.6.5):

from itertools import groupby
import re

x = " has 15 science@and^engineering--departments, affiliated centers, Bandar Abbas&&and Mahshahr."

z = x*10000
%timeit [''.join(j) for i, j in groupby(z, key=str.isalnum) if i] # 184 ms
%timeit list(filter(None, re.sub(r'\W+', ',', z).split(','))) # 82.1 ms
%timeit list(filter(None, re.split('\W+', z))) # 63.6 ms
%timeit [_ for _ in re.split(r'\W', z) if _] # 62.9 ms

How to split a string based on either a colon or a hyphen?

To split on more than one delimiter, you can use re.split and a character set:

import re
re.split('[-:]', a)

Demo:

>>> import re
>>> a = '4-6'
>>> b = '7:10'
>>> re.split('[-:]', a)
['4', '6']
>>> re.split('[-:]', b)
['7', '10']

Note however that - is also used to specify a range of characters in a character set. For example, [A-Z] will match all uppercase letters. To avoid this behavior, you can put the - at the start of the set as I did above. For more information on Regex syntax, see Regular Expression Syntax in the docs.



Related Topics



Leave a reply



Submit