Splitting a List Based on a Delimiter Word

Splitting a list based on a delimiter word

I would use a generator:

def group(seq, sep):
g = []
for el in seq:
if el == sep:
yield g
g = []
g.append(el)
yield g

ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)

This prints

[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]

The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).

Split list into lists based on a character occurring inside of an element

First, a quick oneliner, which is not an optimal solution in terms of space requirements, but it's short and sweet:

>>> smallerlist = [l.split(',') for l in ','.join(biglist).split('|')]
>>> smallerlist
[['X', '1498393178', '1'],
['Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2', ''],
['Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2', ''],
['Y', '11098646289856', '-91', '-88', '-89', '3', '3', '2', ''],
['Y', '35521515162112', '-82', '-74', '-79', '3', '3', '2', ''],
['Z', '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']]

Here we join all elements of the big list by a unique non-appearing separator, for example ,, then split by |, and then split again each list into a sublist of the original elements.

But if you're looking for a bit more efficient solution, you can do it with itertools.groupby that will operate on an intermediate list, generated on fly with the breakby() generator, in which elements without | separator are returned as is, and those with separator are split into 3 elements: first part, a list-delimiter (e.g. None), and the second part.

from itertools import groupby

def breakby(biglist, sep, delim=None):
for item in biglist:
p = item.split(sep)
yield p[0]
if len(p) > 1:
yield delim
yield p[1]

smallerlist = [list(g) for k,g in groupby(breakby(biglist, '|', None),
lambda x: x is not None) if k]

How to split a list into sublists based on a separator, similar to str.split()?

A simple generator will work for all of the cases in your question:

def split(sequence, sep):
chunk = []
for val in sequence:
if val == sep:
yield chunk
chunk = []
else:
chunk.append(val)
yield chunk

How do I split a string into a list of words?

Given a string sentence, this stores each word in a list called words:

words = sentence.split()

How to split a list of strings based on delimiter string that ends with specific character in Python?

You can use itertools.groupby:

import itertools
data = [[a, list(b)] for a, b in itertools.groupby(content.split('\n'), key=lambda x:x.endswith(':'))]
final_result = [' '.join(b) for a, b in data if not a]

Output:

['Hi', 'London UK USA', 'here there', 'something somethin2']

Python: Split a list into multiple lists based on a subset of elements

Consider using one of many helpful tools from a library, i.e. more_itertools.split_at:

Given

import more_itertools as mit

lst = [
"abcd 1233", "cdgfh3738", "hryg21", "**L**",
"gdyrhr657", "abc31637", "**R**",
"7473hrtfgf"
]

Code

result = list(mit.split_at(lst, pred=lambda x: set(x) & {"L", "R"}))

Demo

sublist_1, sublist_2, sublist_3 = result

sublist_1
# ['abcd 1233', 'cdgfh3738', 'hryg21']
sublist_2
# ['gdyrhr657', 'abc31637']
sublist_3
# ['7473hrtfgf']

Details

The more_itertools.split_at function splits an iterable at positions that meet a special condition. The conditional function (predicate) happens to be a lambda function, which is equivalent to and substitutable with the following regular function:

def pred(x):
a = set(x)
b = {"L", "R"}
return a.intersection(b)

Whenever characters of string x intersect with L or R, the predicate returns True, and the split occurs at that position.

Install this package at the commandline via > pip install more_itertools.

Split Strings into words with multiple word boundary delimiters

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']


Related Topics



Leave a reply



Submit