Obtaining Connected Components of Neighboring Values

Obtaining connected components of neighboring values

You can turn your binary matrix into a raster object, and use the raster::clumps function to "Detect clumps (patches) of connected cells. Each clump gets a unique ID". Then it is just data management to return the exact format you want. Example below:

library(igraph)
library(raster)

mat = rbind(c(1,0,0,0,0),
            c(1,0,0,1,0),
            c(0,0,1,0,0),
            c(0,0,0,0,0),
            c(1,1,1,1,1))
Rmat <- raster(mat)
Clumps <- as.matrix(clump(Rmat, directions=4))

#turn the clumps into a list
tot <- max(Clumps, na.rm=TRUE)
res <- vector("list",tot)
for (i in 1:tot){
  res[i] <- list(which(Clumps == i, arr.ind = TRUE))
}

Which then res prints out at the console:

> res
[[1]]
     row col
[1,]   1   1
[2,]   2   1

[[2]]
     row col
[1,]   2   4

[[3]]
     row col
[1,]   3   3

[[4]]
     row col
[1,]   5   1
[2,]   5   2
[3,]   5   3
[4,]   5   4
[5,]   5   5

I wouldn't be surprised if there is a better way to go from the raster object to your end goal though. Again a 2000 by 2000 matrix should not be a big deal for this.

Old (wrong answer) but should be useful for people who want connected components of a graph.

You can use the igraph package to turn your adjacency matrix into a network and return the components. Your example graph is one component, so I removed one edge for illustration.

library(igraph)
mat = rbind(c(1,0,0,0,0),
            c(1,0,0,1,0),
            c(0,0,1,0,0),
            c(0,0,0,0,0),
            c(1,1,1,1,1))
g  <- graph.adjacency(mat) %>% delete_edges("5|3")
plot(g)
clu <- components(g)
groups(clu)

The final line then returns at the prompt:

> groups(clu)
$`1`
[1] 1 2 4 5

$`2`
[1] 3

My experience with this algorithm it is pretty fast - so I don't think 2,000 by 2,000 will be a problem.

A fast way to find connected component in a 1-NN graph?

The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.

This will definitely run in linear time:

- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
    and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).

Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:

for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop, 
   collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets

The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.

You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.

Connected Component in Undirected Graph in python

Your condition if not node: doesn't make sense: you're not modifying nodes or making them Falsey. Depth first search explores a forest of rooted trees. Your outer loop over nodes looks for the roots of these trees, so you should just pass a reference to an empty list into dfs, to track all the nodes explored in the search tree for that root:

class Solution:
    def connectedSet(self, nodes):
        res = []
        visited = set()

        def dfs(node, path):
            path.append(node.label)
            visited.add(node)
            for neighbor in node.neighbors:
                if neighbor not in visited:
                    dfs(neighbor, path)

        for node in nodes:
            if node not in visited:
                path = []
                dfs(node, path)
                res.append(sorted(path.copy()))
        return res

Finding connected components in a pixel-array

Considering that the groups should never touch each other, you can use scipy.ndimage.measurements.label to label the groups:

In [1]: import numpy as np

In [2]: from scipy.ndimage.measurements import label

In [3]: array = np.array(...)  # your example

In [4]: structure = np.ones((3, 3), dtype=np.int)  # this defines the connection filter

In [5]: structure  # in this case we allow any kind of connection
Out[5]: 
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

In [6]: labeled, ncomponents = label(array, structure)

In [7]: labeled
Out[7]: 
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)

In [7]: ncomponents
Out[7]: 2

Although I haven't read the particular implementation, SciPy tends to use highly efficient algorithms implemented in C, hence the performance should be relatively high. You can then extract the indices for each group using NumPy:

In [8]: indices = np.indices(array.shape).T[:,:,[1, 0]]

In [9]: indices[labeled == 1]
Out[9]: 
array([[ 1,  6],
       [ 1,  7],
       [ 2,  6],
       [ 2,  7],
       [ 2,  8],
       [ 2,  9],
       [ 2, 10],
       [ 2, 11],
       [ 2, 12],
       [ 2, 13],
       [ 3, 11],
       [ 3, 12],
       [ 3, 13]])

In [10]: indices[labeled == 2]
Out[10]: 
array([[ 5,  1],
       [ 6,  1],
       [ 7,  1],
       [ 7,  2],
       [ 8,  1],
       [ 8,  2],
       [ 9,  2],
       [10,  2],
       [10,  3],
       [11,  2],
       [11,  3],
       [12,  3],
       [13,  3]])

How to find connected components?

Let's simplify the graph representation:

myGraph = {0: [1,2,3], 1: [], 2: [1], 3: [4,5],4: [3,5], 5: [3,4,7], 6: [8], 7: [],8: [9], 9: []}

Here we have the function returning a dictionary whose keys are the roots and whose values are the connected components:

def getRoots(aNeigh):
    def findRoot(aNode,aRoot):
        while aNode != aRoot[aNode][0]:
            aNode = aRoot[aNode][0]
        return (aNode,aRoot[aNode][1])
    myRoot = {} 
    for myNode in aNeigh.keys():
        myRoot[myNode] = (myNode,0)  
    for myI in aNeigh: 
        for myJ in aNeigh[myI]: 
            (myRoot_myI,myDepthMyI) = findRoot(myI,myRoot) 
            (myRoot_myJ,myDepthMyJ) = findRoot(myJ,myRoot) 
            if myRoot_myI != myRoot_myJ: 
                myMin = myRoot_myI
                myMax = myRoot_myJ 
                if  myDepthMyI > myDepthMyJ: 
                    myMin = myRoot_myJ
                    myMax = myRoot_myI
                myRoot[myMax] = (myMax,max(myRoot[myMin][1]+1,myRoot[myMax][1]))
                myRoot[myMin] = (myRoot[myMax][0],-1) 
    myToRet = {}
    for myI in aNeigh: 
        if myRoot[myI][0] == myI:
            myToRet[myI] = []
    for myI in aNeigh: 
        myToRet[findRoot(myI,myRoot)[0]].append(myI) 
    return myToRet

Let's try it:

print getRoots(myGraph)

{8: [6, 8, 9], 1: [0, 1, 2, 3, 4, 5, 7]}

Finding the largest connected component in an adj matrix graph?

Pick a starting point and start "walking" to other nodes until you have exhausted. Do this until you have found all components. This will run in O(n) where n is the size of the graph.

A skeleton of a solution in Python:

class Node(object):
    def __init__(self, index, neighbors):
        self.index = index
        # A set of indices (integers, not Nodes)
        self.neighbors = set(neighbors)

    def add_neighbor(self, neighbor):
        self.neighbors.add(neighbor)

class Component(object):
    def __init__(self, nodes):
        self.nodes = nodes
        self.adjacent_nodes = set()
        for node in nodes:
            self.adjacent_nodes.add(node.index)
            self.adjacent_nodes.update(node.neighbors)

    @property
    def size(self):
        return len(self.nodes)

    @property
    def node_indices(self):
        return set(node.index for node in self.nodes)

    @property
    def is_saturated(self):
        return self.node_indices == self.adjacent_nodes

    def add_node(self, node):
        self.nodes.append(node)
        self.adjacent_nodes.update(node.neighbors)

adj_matrix = [[0, 0, 0, 0, 0, 0, 0, 1, 1, 0],
              [0, 0, 0, 1, 1, 0, 0, 0, 0, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
              [0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
              [0, 1, 0, 0, 0, 0, 0, 1, 0, 0],
              [0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
              [1, 0, 0, 0, 1, 1, 0, 0, 0, 0],
              [1, 0, 0, 1, 0, 0, 0, 0, 0, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
matrix_size = len(adj_matrix)

nodes = {}
for index in range(matrix_size):
    neighbors = [neighbor for neighbor, value in enumerate(adj_matrix[index])
                 if value == 1]
    nodes[index] = Node(index, neighbors)

components = []
index, node = nodes.popitem()
component = Component([node])
while nodes:
    if not component.is_saturated:
        missing_node_index = component.adjacent_nodes.difference(component.node_indices).pop()
        missing_node = nodes.pop(missing_node_index)
        component.add_node(missing_node)
    else:
        components.append(component)

        index, node = nodes.popitem()
        component = Component([node])

# Final component needs to be appended as well
components.append(component)

print max([component.size for component in components])

Two pass connected component, Number of components issue

I'm not sure I fully understand your code, but I think there might be a simple fix to your problem of too many variables and if statements. Instead of using separate variables and code to save each of your images, you should put them in lists and index those lists to get at the values to update and save.

Here's how that might look for your code:

# at some point above, create an "images" list instead of separate Zero, One, etc variables
# also create a "counts" list instead of count, count1, etc

    for (x, y) in labels:
        component = uf.find(labels[(x, y)])
        labels[(x, y)] = component

        # optionally, add code here to create a new image if needed

        images[component][y][x] = 255   # update image
        counts[component] += 1          # update count

        if counts[component] > 43:      # if count is high enough, save out image
            img = images[component] = Image.fromarray(Zero)
            img.save(os.path.join(dirs, 'image{:02d}.png'.format(component), 'png')

Note that you need to generate the image filenames programmatically, so instead of Zero.png and One.png, etc, I went for image00.png and image01.png. You could probably create a mapping from numbers to English names if you wanted to keep the same name system, but I suspect using digits will be more convenient for your later use anyway.

If you don't know how many images and counts you need ahead of time, you could add some extra logic to the loop that would create a new ones as needed, appending them to the images and counts lists. I put a comment in the code above showing where you'd want that logic.

Merging neighboring connected components with different labels

You can do it as follows. Let target denote the target value, corresponding to orange.

Identify regions defined as connected components without regard to color, using bwlabel.
For each region, if it contains at least a pixel equal to target, set the whole region to target.

target = 1; % value corresponding to orange
O = I;
[regions, numRegions] = bwlabel(I);
for regionId = 1:numRegions
    ind = regions==regionId;
    if any(I(ind)==target)
        O(ind) = target;
    end
end

Obtaining Connected Components of Neighboring Values