How to find all connected subgraphs of an undirected graph
Here is a variant that doesn't use cursor, but uses a single recursive query.
Essentially, it treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected. Then it puts all found loops in groups and gives each group a number.
See the detailed explanations of how it works below. I recommend you to run the query CTE-by-CTE and examine each intermediate result to understand what it does.
Sample 1
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(2, 'b', 'b'),
(3, 'c', 'a'),
(4, 'c', 'b'),
(5, 'c', 'c');
Sample 2
I added one more row with z
value to have multiple rows with unpaired values.
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(1, 'a', 'c'),
(2, 'b', 'f'),
(3, 'a', 'g'),
(4, 'c', 'h'),
(5, 'b', 'j'),
(6, 'd', 'f'),
(7, 'e', 'k'),
(8, 'i', NULL),
(88, 'z', 'z'),
(9, 'l', 'h');
Sample 3
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'f'),
(2, 'a', 'g'),
(3, 'a', NULL),
(4, 'b', 'c'),
(5, 'b', 'a'),
(6, 'b', 'h'),
(7, 'b', 'j'),
(8, 'b', NULL),
(9, 'b', NULL),
(10, 'b', 'g'),
(11, 'c', 'k'),
(12, 'c', 'b'),
(13, 'd', 'l'),
(14, 'd', 'f'),
(15, 'd', 'g'),
(16, 'd', 'm'),
(17, 'd', 'a'),
(18, 'd', NULL),
(19, 'd', 'a'),
(20, 'e', 'c'),
(21, 'e', 'b'),
(22, 'e', NULL);
Query
WITH
CTE_Idents
AS
(
SELECT Ident1 AS Ident
FROM @T
UNION
SELECT Ident2 AS Ident
FROM @T
)
,CTE_Pairs
AS
(
SELECT Ident1, Ident2
FROM @T
WHERE Ident1 <> Ident2
UNION
SELECT Ident2 AS Ident1, Ident1 AS Ident2
FROM @T
WHERE Ident1 <> Ident2
)
,CTE_Recursive
AS
(
SELECT
CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent
, Ident1
, Ident2
, CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
UNION ALL
SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult
UNION
SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
SELECT
CTE_Idents.Ident
,CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
,DENSE_RANK() OVER(ORDER BY
CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
) AS GroupID
FROM
CTE_Idents
CROSS APPLY
(
SELECT CTE_CleanResult.Ident+','
FROM CTE_CleanResult
WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
) AS CA_XML(XML_Value)
CROSS APPLY
(
SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
) AS CA_Data(XML_Value)
WHERE
CTE_Idents.Ident IS NOT NULL
ORDER BY Ident;
Result 1
+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a | a,b,c, | 1 |
| b | a,b,c, | 1 |
| c | a,b,c, | 1 |
+-------+--------------+---------+
Result 2
+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a | a,c,g,h,l, | 1 |
| b | b,d,f,j, | 2 |
| c | a,c,g,h,l, | 1 |
| d | b,d,f,j, | 2 |
| e | e,k, | 3 |
| f | b,d,f,j, | 2 |
| g | a,c,g,h,l, | 1 |
| h | a,c,g,h,l, | 1 |
| i | i | 4 |
| j | b,d,f,j, | 2 |
| k | e,k, | 3 |
| l | a,c,g,h,l, | 1 |
| z | z | 5 |
+-------+--------------+---------+
Result 3
+-------+--------------------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------------------+---------+
| a | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| b | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| c | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| d | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| e | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| f | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| g | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| h | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| j | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| k | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| l | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| m | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
+-------+--------------------------+---------+
How it works
I'll use the second set of sample data for this explanation.
CTE_Idents
CTE_Idents
gives the list of all Identifiers that appear in both Ident1
and Ident2
columns.
Since they can appear in any order we UNION
both columns together. UNION
also removes any duplicates.
+-------+
| Ident |
+-------+
| NULL |
| a |
| b |
| c |
| d |
| e |
| f |
| g |
| h |
| i |
| j |
| k |
| l |
| z |
+-------+
CTE_Pairs
CTE_Pairs
gives the list of all edges of the graph in both directions. Again, UNION
is used to remove any duplicates.
+--------+--------+
| Ident1 | Ident2 |
+--------+--------+
| a | c |
| a | g |
| b | f |
| b | j |
| c | a |
| c | h |
| d | f |
| e | k |
| f | b |
| f | d |
| g | a |
| h | c |
| h | l |
| j | b |
| k | e |
| l | h |
+--------+--------+
CTE_Recursive
CTE_Recursive
is the main part of the query that recursively traverses the graph starting from each unique Identifier.
These starting rows are produced by the first part of UNION ALL
.
The second part of UNION ALL
recursively joins to itself linking Ident2
to Ident1
.
Since we pre-made CTE_Pairs
with all edges written in both directions, we can always link only Ident2
to Ident1
and we'll get all paths in the graph.
At the same time the query builds IdentPath
- a string of comma-delimited Identifiers that have been traversed so far.
It is used in the WHERE
filter:
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
As soon as we come across the Identifier that had been included in the Path before, the recursion stops as the list of connected nodes is exhausted.AnchorIdent
is the starting Identifier for the recursion, it will be used later to group results. Lvl
is not really used, I included it for better understanding of what is going on.
+-------------+--------+--------+-------------+-----+
| AnchorIdent | Ident1 | Ident2 | IdentPath | Lvl |
+-------------+--------+--------+-------------+-----+
| a | a | c | ,a,c, | 1 |
| a | a | g | ,a,g, | 1 |
| b | b | f | ,b,f, | 1 |
| b | b | j | ,b,j, | 1 |
| c | c | a | ,c,a, | 1 |
| c | c | h | ,c,h, | 1 |
| d | d | f | ,d,f, | 1 |
| e | e | k | ,e,k, | 1 |
| f | f | b | ,f,b, | 1 |
| f | f | d | ,f,d, | 1 |
| g | g | a | ,g,a, | 1 |
| h | h | c | ,h,c, | 1 |
| h | h | l | ,h,l, | 1 |
| j | j | b | ,j,b, | 1 |
| k | k | e | ,k,e, | 1 |
| l | l | h | ,l,h, | 1 |
| l | h | c | ,l,h,c, | 2 |
| l | c | a | ,l,h,c,a, | 3 |
| l | a | g | ,l,h,c,a,g, | 4 |
| j | b | f | ,j,b,f, | 2 |
| j | f | d | ,j,b,f,d, | 3 |
| h | c | a | ,h,c,a, | 2 |
| h | a | g | ,h,c,a,g, | 3 |
| g | a | c | ,g,a,c, | 2 |
| g | c | h | ,g,a,c,h, | 3 |
| g | h | l | ,g,a,c,h,l, | 4 |
| f | b | j | ,f,b,j, | 2 |
| d | f | b | ,d,f,b, | 2 |
| d | b | j | ,d,f,b,j, | 3 |
| c | h | l | ,c,h,l, | 2 |
| c | a | g | ,c,a,g, | 2 |
| b | f | d | ,b,f,d, | 2 |
| a | c | h | ,a,c,h, | 2 |
| a | h | l | ,a,c,h,l, | 3 |
+-------------+--------+--------+-------------+-----+
CTE_CleanResult
CTE_CleanResult
leaves only relevant parts from CTE_Recursive
and again merges both Ident1
and Ident2
using UNION
.
+-------------+-------+
| AnchorIdent | Ident |
+-------------+-------+
| a | a |
| a | c |
| a | g |
| a | h |
| a | l |
| b | b |
| b | d |
| b | f |
| b | j |
| c | a |
| c | c |
| c | g |
| c | h |
| c | l |
| d | b |
| d | d |
| d | f |
| d | j |
| e | e |
| e | k |
| f | b |
| f | d |
| f | f |
| f | j |
| g | a |
| g | c |
| g | g |
| g | h |
| g | l |
| h | a |
| h | c |
| h | g |
| h | h |
| h | l |
| j | b |
| j | d |
| j | f |
| j | j |
| k | e |
| k | k |
| l | a |
| l | c |
| l | g |
| l | h |
| l | l |
+-------------+-------+
Final SELECT
Now we need to build a string of comma-separated Ident
values for each AnchorIdent
. CROSS APPLY
with FOR XML
does it.DENSE_RANK()
calculates the GroupID
numbers for each AnchorIdent
.
Find all complete sub-graphs within a graph
This is known as the clique problem; it's hard and is in NP-complete in general, and yes there are many algorithms to do this.
If the graph has additional properties (e.g. it's bipartite), then the problem becomes considerably easier and is solvable in polynomial time, but otherwise it's very hard, and is completely solvable only for small graphs.
From Wikipedia
In computer science, the clique problem refers to any of the problems related to finding particular complete subgraphs ("cliques") in a graph, i.e., sets of elements where each pair of elements is connected.
Clique problems include:
- finding the maximum clique (a clique with the largest number of vertices),
- finding a maximum weight clique in a weighted graph,
- listing all maximal cliques (cliques that cannot be enlarged)
- solving the decision problem of testing whether a graph contains a clique larger than a given size.
These problems are all hard: the clique decision problem is NP-complete (one of Karp's 21 NP-complete problems), the problem of finding the maximum clique is both fixed-parameter intractable and hard to approximate, and listing all maximal cliques may require exponential time as there exist graphs with exponentially many maximal cliques. Nevertheless, there are algorithms for these problems that run in exponential time or that handle certain more specialized input graphs in polynomial time.
See also
- Bron–Kerbosch algorithm
Finding All Connected Components of an Undirected Graph
This can be solved using a Breadth First Search.
The idea is to traverse all reachable vertices from a source vertex, by hopping to adjacent vertices. Vertices right next to the source vertex are first visited, followed by vertices that are 2 hops away, etc.
The code here is not very efficient due to the graph representation used, which is an edge list . To obtain better performance, you might want to use an adjacency list.
Here's some working code in JavaScript. I used node.js
to run this:
// Breadth First Search function
// v is the source vertex
// all_pairs is the input array, which contains length 2 arrays
// visited is a dictionary for keeping track of whether a node is visited
var bfs = function(v, all_pairs, visited) {
var q = [];
var current_group = [];
var i, nextVertex, pair;
var length_all_pairs = all_pairs.length;
q.push(v);
while (q.length > 0) {
v = q.shift();
if (!visited[v]) {
visited[v] = true;
current_group.push(v);
// go through the input array to find vertices that are
// directly adjacent to the current vertex, and put them
// onto the queue
for (i = 0; i < length_all_pairs; i += 1) {
pair = all_pairs[i];
if (pair[0] === v && !visited[pair[1]]) {
q.push(pair[1]);
} else if (pair[1] === v && !visited[pair[0]]) {
q.push(pair[0]);
}
}
}
}
// return everything in the current "group"
return current_group;
};
var pairs = [
["a2", "a5"],
["a3", "a6"],
["a4", "a5"],
["a7", "a9"]
];
var groups = [];
var i, k, length, u, v, src, current_pair;
var visited = {};
// main loop - find any unvisited vertex from the input array and
// treat it as the source, then perform a breadth first search from
// it. All vertices visited from this search belong to the same group
for (i = 0, length = pairs.length; i < length; i += 1) {
current_pair = pairs[i];
u = current_pair[0];
v = current_pair[1];
src = null;
if (!visited[u]) {
src = u;
} else if (!visited[v]) {
src = v;
}
if (src) {
// there is an unvisited vertex in this pair.
// perform a breadth first search, and push the resulting
// group onto the list of all groups
groups.push(bfs(src, pairs, visited));
}
}
// show groups
console.log(groups);
UPDATE: I have updated my answer to demonstrate how to convert an edge list to an adjacency list. The code is commented and should illustrate the concept rather well. The Breadth First Search function is modified to make use of an adjacency list, along with another slight modification (regarding marking vertices as visited).
// Converts an edgelist to an adjacency list representation
// In this program, we use a dictionary as an adjacency list,
// where each key is a vertex, and each value is a list of all
// vertices adjacent to that vertex
var convert_edgelist_to_adjlist = function(edgelist) {
var adjlist = {};
var i, len, pair, u, v;
for (i = 0, len = edgelist.length; i < len; i += 1) {
pair = edgelist[i];
u = pair[0];
v = pair[1];
if (adjlist[u]) {
// append vertex v to edgelist of vertex u
adjlist[u].push(v);
} else {
// vertex u is not in adjlist, create new adjacency list for it
adjlist[u] = [v];
}
if (adjlist[v]) {
adjlist[v].push(u);
} else {
adjlist[v] = [u];
}
}
return adjlist;
};
// Breadth First Search using adjacency list
var bfs = function(v, adjlist, visited) {
var q = [];
var current_group = [];
var i, len, adjV, nextVertex;
q.push(v);
visited[v] = true;
while (q.length > 0) {
v = q.shift();
current_group.push(v);
// Go through adjacency list of vertex v, and push any unvisited
// vertex onto the queue.
// This is more efficient than our earlier approach of going
// through an edge list.
adjV = adjlist[v];
for (i = 0, len = adjV.length; i < len; i += 1) {
nextVertex = adjV[i];
if (!visited[nextVertex]) {
q.push(nextVertex);
visited[nextVertex] = true;
}
}
}
return current_group;
};
var pairs = [
["a2", "a5"],
["a3", "a6"],
["a4", "a5"],
["a7", "a9"]
];
var groups = [];
var visited = {};
var v;
// this should look like:
// {
// "a2": ["a5"],
// "a3": ["a6"],
// "a4": ["a5"],
// "a5": ["a2", "a4"],
// "a6": ["a3"],
// "a7": ["a9"],
// "a9": ["a7"]
// }
var adjlist = convert_edgelist_to_adjlist(pairs);
for (v in adjlist) {
if (adjlist.hasOwnProperty(v) && !visited[v]) {
groups.push(bfs(v, adjlist, visited));
}
}
console.log(groups);
An algorithm to get all connected subgraphs from graph, is it correct?
Here is an Python implementation of what I believe to be your original algorithm:
from collections import defaultdict
D=defaultdict(list)
def addedge(a,b):
D[a].append(b)
D[b].append(a)
addedge(1,2)
addedge(2,3)
addedge(3,4)
V=D.keys()
k=2
def F(X,Y):
if len(X)==k:
return
if X:
T = set(a for x in X for a in D[x] if a not in Y and a not in X)
else:
T = V
Y1=set(Y)
for v in T:
X.add(v)
print X
F(X,Y1)
X.remove(v)
Y1.add(v)
print 'original method'
F(set(),set())
F generates all connected subgraphs of size <=k where the subgraph must include vertices in X (a connected subgraph itself), and must not include vertices in Y.
We know that to include another vertex in the subgraph we must use a connected vertex so we can recurse based on the identity of the first connected vertex v that is inside the final subgraph. The forbidden set means that we ensure that a second copy of subgraph cannot be generated as this copy would have to use v, but v is in the forbidden set so cannot be used again.
So at this superficial level of analysis, this algorithm appears efficient and correct.
Efficient way for finding all the complete subgraphs of a given graph (Python)?
Ok, I found it. It's simply list(nx.find_cliques(G))
, just because I didn't know that in graph theory a clique is a fully connected subgraph.
EDIT
More precisely, list(nx.find_cliques(G))
finds the maximal cliques, therefore it's not what I need. I found a similar post at this link.
So the correct answer is to use list(nx.enumerate_all_cliques(G))
. However, this function returns also cliques of size 1, which I don't like since I don't have self-loops in my graph. Therefore the final solution is to use the following line of code:
[s for s in nx.enumerate_all_cliques(G) if len(s) > 1]
Related Topics
Get Day of Week in SQL Server 2005/2008
How to Do a Case Sensitive Search in Where Clause (I'M Using SQL Server)
MySQL Delete from With Subquery as Condition
MySQL: Invalid Use of Group Function
Creating Table Names That Are Reserved Words/Keywords in Ms SQL Server
Fastest Way to Count Exact Number of Rows in a Very Large Table
Sql: If Clause Within Where Clause
Foreign Key Referring to Primary Keys Across Multiple Tables
T-SQL Datetime Rounded to Nearest Minute and Nearest Hours With Using Functions
SQL Server: Group by Clause to Get Comma-Separated Values
Sql: Find Missing Ids in a Table
MySQL - Subtracting Value from Previous Row, Group By