How to Group a List of Tuples/Objects by Similar Index/Attribute in Python

How to group a list of tuples/objects by similar index/attribute in python?

defaultdict is how this is done.

While for loops are largely essential, if statements aren't.

from collections import defaultdict

groups = defaultdict(list)

for obj in old_list:
groups[obj.some_attr].append(obj)

new_list = groups.values()

How to group objects (tuples) based on having consecutive attribute value?

You can do the following:

from itertools import groupby, count
from operator import itemgetter

data = [('a', 12), ('b', 13), ('c', 15), ('c', 16), ('c', 17)]

def key(i, cursor=count(0)):
"""Generate the same key for consecutive numbers"""
return i[1] - next(cursor)

ordered = sorted(data, key=itemgetter(1))

result = [list(group) for _, group in groupby(ordered, key=key)]
print(result)

Output

[[('a', 12), ('b', 13)], [('c', 15), ('c', 16), ('c', 17)]]

The above is based on an old example found in the documentation of Python 2.6, here.

To better illustrate, what is happening, for the following example:

lst = [12, 13, 15, 16, 17]
print([v - i for i, v in enumerate(lst)])

The generated keys are:

[12, 12, 13, 13, 13]

As it can be seen, consecutive runs have the same key.

I want to group tuples based on similar attributes

As we fill in the components, at each stage there are three cases to consider (as you will have to match up overlapping groups):

  1. Neither x or y are in any component already found.
  2. Both are already in different sets, x in set_i and y in set_j.
  3. Either one or both are in one component, x in set_i or y in a set_i.

We can use the built-in set to help. (see @jwpat's and @DSM's trickier examples):

def connected_components(lst):
components = [] # list of sets
for (x,y) in lst:
i = j = set_i = set_j = None
for k, c in enumerate(components):
if x in c:
i, set_i = k, c
if y in c:
j, set_j = k, c

#case1 (or already in same set)
if i == j:
if i == None:
components.append(set([x,y]))
continue

#case2
if i != None and j != None:
components = [components[k] for k in range(len(components)) if k!=i and k!=j]
components.append(set_i | set_j)
continue

#case3
if j != None:
components[j].add(x)
if i != None:
components[i].add(y)

return components

lst = [(1, 2), (2, 3), (4, 3), (5, 6), (6, 7), (8, 2)]
connected_components(lst)
# [set([8, 1, 2, 3, 4]), set([5, 6, 7])]
map(list, connected_components(lst))
# [[8, 1, 2, 3, 4], [5, 6, 7]]

connected_components([(1, 2), (4, 3), (2, 3), (5, 6), (6, 7), (8, 2)])
# [set([8, 1, 2, 3, 4]), set([5, 6, 7])] # @jwpat's example

connected_components([[1, 3], [2, 4], [3, 4]]
# [set([1, 2, 3, 4])] # @DSM's example

This certainly won't be the most efficient method, but is perhaps one similar to what they would expect. As Jon Clements points out there is a library for these type of calculations: networkx, where they will be much more efficent.

Group elements in python-list by type

Use collections.defaultdict:

from collections import defaultdict

l = [[], 1, 2, 'a', 3, 'b', [5, 6]]

accumulation = defaultdict(list)
for e in l:
accumulation[type(e)].append(e)

result = list(accumulation.values())
print(result)

Output

[[[], [5, 6]], [1, 2, 3], ['a', 'b']]

As an alternative you could use setdefault:

accumulation = {}
for e in l:
accumulation.setdefault(type(e), []).append(e)

grouping list of tuples with itertools

This has tripped me up in the past as well. If you want it to group globally, it's best to sort the list first:

In [163]: test = [(1,1),(3,1),(5,0),(3,0),(2,1)]

In [164]: crit = operator.itemgetter(1)

In [165]: test.sort(key=crit)

In [166]: result = [list(group) for key, group in itertools.groupby(test, crit)]

In [167]: result
Out[167]: [[(5, 0), (3, 0)], [(1, 1), (3, 1), (2, 1)]]

Grouping lists based on a certain value in python and then returning the minimum of the group

If lst is sorted by the first elements (if not first sort using lst.sort(key=lambda x: x[0])), then you could use itertools.groupby to group the lists by the first element, then use min with a key that compares each group by the last elements:

from itertools import groupby
out = [min(g, key=lambda x: x[-1]) for k, g in groupby(lst, lambda x: x[0])]

Output:

[(1, 42, 15, 5), (2, 72, 39, 6), (3, 12, 15, 1)]

Or if the number of tuples for each index is the same, we could get the desired outcome with sorted + list slicing:

out = sorted(lst, key=lambda x: (x[0], x[-1]))[::3]

How to combine values of int's with the same group in list of tuples?

You can use a dictionary to get the desired result without importing any extra module:

lst = [('a', 1),('a', 2),('b', 0),('b', 1),('c', 0)]

Dict = {}

for tup in lst:

first=tup[0]
second=tup[1]
if first not in Dict:
Dict[first]=0
Dict[first]+=second

secondList = []

for key in Dict.keys():
secondList.append((key,Dict[key]))

print(secondList)

List of Tuples to List of List of Tuples

Use itertools.groupby to group items based on the integers:

from itertools import groupby

lst = [list(g)for _, g in groupby(tuple_list, lambda x: x[0])]
print(lst)

[[(1, 'hello', 'apple'), (1, 'no', 'orange')], 
[(2, 'bye', 'grape')],
[(3, 'okay', 'banana')],
[(4, 'how are you?raisin'), (4, "I'm doing well", 'watermelon')]]

Group a list of tuples on two values, and return a list of all the third value

No need to use two nested groupby grouping by a single field. Instead use itemgetter with two parameters or a lambda to group by both the first two values at once, then a list comprehension to get the final elements.

>>> from itertools import groupby
>>> from operator import itemgetter
>>> lst = [(1, 1, 4), (1, 1, 9), (1, 1, 14), (2, 1, 12), (2, 1, 99), (2, 6, 14), (2, 6, 19)]
>>> [(*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]

If, for whatever reason, you want to use two separate groupby, you can use this:

>>> [(k1, k2, [x[2] for x in g2]) for k1, g1 in groupby(lst, key=itemgetter(0))
... for k2, g2 in groupby(g1, key=itemgetter(1))]
[(1, 1, [4, 9, 14]), (2, 1, [12, 99]), (2, 6, [14, 19])]

Of course, this also works as a regular (nested) loop, more in line with your original code:

def sorter(lst):
for k1, g1 in groupby(lst, key=itemgetter(0)):
for k2, g2 in groupby(g1, key=itemgetter(1)):
yield (k1, k2, [x[2] for x in g2])

Or with the single groupby, returning a generator object:

def sorter(lst):
return ((*k, [x[2] for x in g]) for k, g in groupby(lst, key=itemgetter(0, 1)))

As always, this assumes that lst is already sorted by the same key. If it is not, sort it first.



Related Topics



Leave a reply



Submit