Splitting a list based on a delimiter word
I would use a generator:
def group(seq, sep):
g = []
for el in seq:
if el == sep:
yield g
g = []
g.append(el)
yield g
ex = ['A', 'WORD', 'B' , 'C' , 'WORD' , 'D']
result = list(group(ex, 'WORD'))
print(result)
This prints
[['A'], ['WORD', 'B', 'C'], ['WORD', 'D']]
The code accepts any iterable, and produces an iterable (which you don't have to flatten into a list if you don't want to).
Split list into lists based on a character occurring inside of an element
First, a quick oneliner, which is not an optimal solution in terms of space requirements, but it's short and sweet:
>>> smallerlist = [l.split(',') for l in ','.join(biglist).split('|')]
>>> smallerlist
[['X', '1498393178', '1'],
['Y', '15496686585007', '-82', '-80', '-80', '3', '3', '2', ''],
['Y', '145292534176372', '-87', '-85', '-85', '3', '3', '2', ''],
['Y', '11098646289856', '-91', '-88', '-89', '3', '3', '2', ''],
['Y', '35521515162112', '-82', '-74', '-79', '3', '3', '2', ''],
['Z', '0.0', '0.0', '0', '0', '0', '0', '0', '4', '0', '154']]
Here we join all elements of the big list by a unique non-appearing separator, for example ,
, then split by |
, and then split again each list into a sublist of the original elements.
But if you're looking for a bit more efficient solution, you can do it with itertools.groupby
that will operate on an intermediate list, generated on fly with the breakby()
generator, in which elements without |
separator are returned as is, and those with separator are split into 3 elements: first part, a list-delimiter (e.g. None
), and the second part.
from itertools import groupby
def breakby(biglist, sep, delim=None):
for item in biglist:
p = item.split(sep)
yield p[0]
if len(p) > 1:
yield delim
yield p[1]
smallerlist = [list(g) for k,g in groupby(breakby(biglist, '|', None),
lambda x: x is not None) if k]
How to split a list into sublists based on a separator, similar to str.split()?
A simple generator will work for all of the cases in your question:
def split(sequence, sep):
chunk = []
for val in sequence:
if val == sep:
yield chunk
chunk = []
else:
chunk.append(val)
yield chunk
How do I split a string into a list of words?
Given a string sentence
, this stores each word in a list called words
:
words = sentence.split()
How to split a list of strings based on delimiter string that ends with specific character in Python?
You can use itertools.groupby
:
import itertools
data = [[a, list(b)] for a, b in itertools.groupby(content.split('\n'), key=lambda x:x.endswith(':'))]
final_result = [' '.join(b) for a, b in data if not a]
Output:
['Hi', 'London UK USA', 'here there', 'something somethin2']
Python: Split a list into multiple lists based on a subset of elements
Consider using one of many helpful tools from a library, i.e. more_itertools.split_at
:
Given
import more_itertools as mit
lst = [
"abcd 1233", "cdgfh3738", "hryg21", "**L**",
"gdyrhr657", "abc31637", "**R**",
"7473hrtfgf"
]
Code
result = list(mit.split_at(lst, pred=lambda x: set(x) & {"L", "R"}))
Demo
sublist_1, sublist_2, sublist_3 = result
sublist_1
# ['abcd 1233', 'cdgfh3738', 'hryg21']
sublist_2
# ['gdyrhr657', 'abc31637']
sublist_3
# ['7473hrtfgf']
Details
The more_itertools.split_at
function splits an iterable at positions that meet a special condition. The conditional function (predicate) happens to be a lambda
function, which is equivalent to and substitutable with the following regular function:
def pred(x):
a = set(x)
b = {"L", "R"}
return a.intersection(b)
Whenever characters of string x
intersect with L
or R
, the predicate returns True
, and the split occurs at that position.
Install this package at the commandline via > pip install more_itertools
.
Split Strings into words with multiple word boundary delimiters
A case where regular expressions are justified:
import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']
Related Topics
Splitting a Pandas Dataframe Column by Delimiter
Passing Table Name as a Parameter in Psycopg2
Separation of Business Logic and Data Access in Django
How to Read and Write Ini File with Python3
How to Account for Period (Am/Pm) Using Strftime
Python & MySQL: Unicode and Encoding
Cs50: Like Operator, Variable Substitution with % Expansion
How to Determine a Point Is Between Two Other Points on a Line Segment
How to Change Index of a for Loop
Flask to Return Image Stored in Database
How to Get Rid of Double Backslash in Python Windows File Path String
What Is _Future_ in Python Used for and How/When to Use It, and How It Works
Python: Pandas Series - Why Use Loc
Absolute VS. Explicit Relative Import of Python Module
Run Command and Get Its Stdout, Stderr Separately in Near Real Time Like in a Terminal