What's a Zip Join? Have You Ever Heard of That, or a Pairwise Join

What's a zip join? Have you ever heard of that, or a pairwise join?

Zip joins are only meaningful when talking about ordered sets. Instead of joining based on the value of a column, you are joining based on the row number.

Table1

[λ]  [color] 
400 violet
415 indigo
475 blue
510 green
570 yellow
590 orange
650 red

Table2

[flame]  [element]
green boron
yellow sodium
white magnesium
red calcium
blue indium

Table1 INNER JOIN Table2 ON [color] = [flame] : only matching rows

[λ]  [color]  [flame]  [element]
475 blue blue indium
510 green green boron
570 yellow yellow sodium
650 red red calcium

Table1 OUTER JOIN Table2 ON [color] = [flame] : all rows, matched where possible

[λ]  [color]  [flame]  [element]
400 violet NULL NULL
415 indigo NULL NULL
475 blue blue indium
510 green green boron
570 yellow yellow sodium
590 orange NULL NULL
650 red red calcium
NULL NULL white magnesium

Table1 "zip joined" to Table2 : all rows, regardless of match

[λ]  [color]  [flame]  [element]
400 violet green boron
415 indigo yellow sodium
475 blue white magnesium
510 green red calcium
570 yellow blue indium
590 orange NULL NULL
650 red NULL NULL

Zip joins are combining the data like a zipper, pairing the first row from one table with the first row from the other, second paired with second, etc. It's not actually looking at that data. They can be generated very quickly, but they won't mean anything unless there is some meaningful order already present in your data or if you just want to generate random pairings

python3 join lists that have same value in list of lists

This can be seen as a graph problem in which you merge subgraphs and need to find the connected components.

Here is your graph:

graph

networkx

Using networkx you can do:

import networkx as nx
from itertools import chain, pairwise
# python < 3.10 use this recipe for pairwise instead
# from itertools import tee
# def pairwise(iterable):
# a, b = tee(iterable)
# next(b, None)
# return zip(a, b)

G = nx.Graph()
G = nx.from_edgelist(chain.from_iterable(pairwise(e) for e in l))
G.add_nodes_from(set.union(*map(set, l))) # adding single items

list(nx.connected_components(G))

output:

[{1, 2, 3, 4}, {5, 6, 7, 8, 9}]

python

Now, you can use pure python to perform the same thing, finding the connected components and merging them.

An example code is nicely described in this post (archive.org link for long term).

In summary, the first step is building the list of neighbors, then a recursive function is used to join the neighbors of neighbors keeping track of the already seen ones.

from collections import defaultdict 

#merge function to merge all sublist having common elements.
def merge_common(lists):
neigh = defaultdict(set)
visited = set()
for each in lists:
for item in each:
neigh[item].update(each)
def comp(node, neigh = neigh, visited = visited, vis = visited.add):
nodes = set([node])
next_node = nodes.pop
while nodes:
node = next_node()
vis(node)
nodes |= neigh[node] - visited
yield node
for node in neigh:
if node not in visited:
yield sorted(comp(node))

example:

merge_common(l)
# [[1, 2, 3, 4], [5, 6, 7, 8, 9]]

Python 3: pairwise iterating through list

Use zip(*[iter(it)] * 2), as seen in this answer.

it = [1,2,3,4,5,6]
for x, y in zip(*[iter(it)] * 2):
print(x, y)

Return Alternating Letters With the Same Length From two Strings

str.join with zip is possible, since zip only iterates pairwise up to the shortest iterable. You can combine with itertools.chain to flatten an iterable of tuples:

from itertools import chain

def one_each(st, dum):
return ''.join(chain.from_iterable(zip(st, dum)))

x = one_each("bofa", "BOFAAAA")

print(x)

bBoOfFaA


Related Topics



Leave a reply



Submit