NOTE
This tutorial is deprecated. It will be updated eventually, but in the mean time please use the files trunk/mtgl/subiso_triangles.hpp and trunk/test/subiso_triangles.cpp as examples. Much of the complexity described below has been encapsulated into a function called subgraph_isomorphism_wrapper.
Description
The subgraph isomorphism problem is the problem of trying to find a match to a small graph, called a pattern or target graph, in a much larger graph. The problem is known to be NPcomplete. Berry, Hendrickson, Kahan, and Konecny presented a heuristic algorithm in the paper "Software and Algorithms for Graph Queries on Multithreaded Architectures" for solving the subgraph isomorphism problem. The algorithm creates a walk through a target graph and matches that walk to a walk in the big graph. In the paper the algorithm is described as matching on type quadruples. Each vertex is given a type, and each edge is given two types where each type represents traveling the edge in a particular direction. The type quadruple is the concatenation of the edge types and the types of its endpoints, namely (source vertex type, forward edge type, reverse edge type, target vertex type). A match for this algorithm is not limited to type quadruples, however. The algorithm will work just as well with any matching scheme a user can imagine.
Prior to performing the matching, the algorithm defines a filtering step to reduce the size of the large graph. The matching algorithm is then run against the smaller filtered graph. The filtering step compares each edge in the big graph to the edges in the target graph. If the big graph edge matches any edge in the target graph based on the matching criteria for subgraph isomorphism, the edge is kept in the filtered graph. Nonmatching edges are not kept. The hope is that for small enough target graphs the cost of the filtering process will be less than the additional cost of running the subgraph isomorphism algorithm through the entire big graph. In practice, this is often the case.
After the graph is filtered, the algorithm creates a directed bipartite graph that represents every possible walk through the big graph that matches the walk through the target graph based on the userdefined matching criteria. The vertices in the bipartite graph are divided into levels where level n represents all the vertices in the big graph that are endpoints for a matching walk through the big graph up to the nth vertex in the target graph's walk. Edges connect consecutive levels where the edges connecting levels n and n+1 are the matches for the nth edge in the walk. The below figure shows a walk with 5 vertices and 4 edges, and its bipartite graph. Notice that some paths don't make it to max level vertices and that sometimes paths converge together and then split apart. This makes the search space represented by the bipartite graph quite complex.
The paper describes the next step as performing some kind of search, perhaps a branch and bound search, through the result space. We have found that in practice a branch and bound algorithm leads to an exponential running time. To better filter the result space, the subgraph algorithm has been modified to use the following method.
A max level vertex is a vertex in the bipartite graph that matches the last vertex in the walk. The max level vertices are shown in the column for v5 in the above figure. A zero level vertex is a vertex in the bipartite graph that matches the first vertex in the walk. The zero level vertices are shown in the column for v1 in the figure above. We want to consider only the parts of the graph that represent complete walks. These are the parts of the graph that are reachable from traveling backwards on the edges from the max level vertices. We will remove all vertices and edges from the graph that are not on a walk from a max level vertex to a zero level vertex. For instance, the bottom vertex in the v4 column and the edge entering it can't be reached from any max level vertex. Thus, we remove it. We call this new graph the colored graph.
Any path on the colored graph from the left side to the right represents a walk through the graph. However, the search space of the colored graph is most likely still too large to consider every possibility. The next part of the algorithm is designed to give a statistically significant subset of the walks through the graph. We add a supersource and a supersink to the graph. The supersource has outgoing edges that connect it to each of the zero level vertices. The supersink has incoming edges that connect it to each of the max level vertices.
On this new graph, we will perform a number of weighted random walks from the supersource to the supersink where the weights are assigned to the edges. Each walk (minus the supersource and supersink) represents a walk through the big graph that is a match for the walk through the target graph. We set the weights to be a modified "edge betweenness". This modified edge betweenness will represent the number of paths from the supersource to the supersink that pass through each edge. In the diagram above, the numbers next to each edge represent the edge betweenness. There are 9 total paths from the supersource to the supersink. Notice that the sum of the betweenness of the edges at each level is 9.
The user passes a visitor to the subgraph isomorphism function that will be applied to each path as it is found.
How to Use the Subgraph Isomorphism Code
The subgraph isomorphism algorithm is implemented in mtgl in the file mtgl/subgraph_isomorphism.hpp. It will work for any graph that defines the MTGL graph API and with all three types of directedness: undirected, directed, and bidirectional. The walk of a graph that has directed edges can contain reverse edges. By reverse edge, we mean a directed edge that is traveled backwards.
There are several steps to using the subgraph isomorphism code.
 Define a big graph.
 Define a target graph.
 Define a walk.
 Define the comparator that describes the matching criteria for the filter graph algorithm.
 Filter the graph.
 Define the comparator that describes the matching criteria for the subgraph isomorphism algorithm.
 Define the match visitor that is applied to matches found by the algorithm.
 Run the subgraph isomorphism algorithm.
The Example Graph
This tutorial will use the following graph as the big graph. Vertices and edges each have a single type. Matching will be performed using the vertex and edge types. Numbers in boxes represent types. All other numbers represent ids.
The example will use this graph as the target graph.
The walk through the target graph will be (0, 0, 1, 2, 2, 1, 0) which is a list of alternating vertex and edge ids. The walk starts at vertex 0, goes over edge 0 to vertex 1, and so on. This walk includes reverse edges.
Notice that there are two matches to the target graph in the big graph. One is highlighted in red and the other in green.
This example shows how the algorithm matches reverse edges. If we were to make both the big and target graphs undirected, in addition to the two matches shown above, there would be another match to the walk shown below in blue. Notice that this isn't a match for the case of directed graphs. Even though all the types match, the direction of edge 3 is wrong.
Basic Example
We start with the most basic usage of the subgraph isomorphism algorithm. We will statically define a big graph, a target graph, and the walk through the target graph. We will skip the filtering step and use the default comparator for the subgraph isomorphism algorithm. The complete code for this example is at tutorial/subgraph_isomorphism1.cpp.
We start by including subgraph_isomorphism and compressed_sparse_row_graph (since we'll be using it as the graph type) and indicating that we are using the MTGL namespace:
#include <mtgl/subgraph_isomorphism.hpp> #include <mtgl/compressed_sparse_row_graph.hpp> using namespace mtgl;
We next define a couple of types, define a directed compressed_sparse_row_graph, and initialize the graph.
typedef compressed_sparse_row_graph<directedS> Graph; typedef graph_traits<Graph>::size_type size_type; // Initialize the original graph. Graph g; const size_type numVerts = 6; const size_type numEdges = 7; size_type sources[numEdges] = { 1, 2, 1, 2, 3, 3, 4 }; size_type targets[numEdges] = { 0, 0, 2, 3, 4, 5, 5 }; init(numVerts, numEdges, sources, targets, g);
We will be using the default comparator for subgraph isomorphism which expects one array property and one edge property. So, the code next defines two array property map types, one for vertices and one for edges.
// Create the big graph vertex and edge type property maps. typedef array_property_map<int, vertex_id_map<Graph> > vertex_property_map; typedef array_property_map<int, edge_id_map<Graph> > edge_property_map; int vTypes[numVerts] = { 1, 2, 3, 2, 3, 1 }; int eTypes[numEdges] = { 1, 2, 3, 3, 3, 1, 2 }; vertex_id_map<Graph> g_vid_map = get(_vertex_id_map, g); edge_id_map<Graph> g_eid_map = get(_edge_id_map, g); vertex_property_map vTypemap(vTypes, g_vid_map); edge_property_map eTypemap(eTypes, g_eid_map);
Now, we need to initialize the target graph and create a vertex and an edge property map for its types.
// Initialize target graph. Graph target; const size_type numVertsTarget = 3; const size_type numEdgesTarget = 3; size_type sourcesTarget[numEdgesTarget] = { 1, 2, 1 }; size_type targetsTarget[numEdgesTarget] = { 0, 0, 2 }; init(numVertsTarget, numEdgesTarget, sourcesTarget, targetsTarget, target); // Create the target graph vertex and edge type property maps. int vTypesTarget[numVertsTarget] = { 1, 2, 3 }; int eTypesTarget[numEdgesTarget] = { 1, 2, 3 }; vertex_id_map<Graph> trg_vid_map = get(_vertex_id_map, target); edge_id_map<Graph> trg_eid_map = get(_edge_id_map, target); vertex_property_map vTypemapTarget(vTypesTarget, trg_vid_map); edge_property_map eTypemapTarget(eTypesTarget, trg_eid_map);
The next step is to define a walk through the target graph. The subgraph isomorphism algorithm takes an array of alternating vertex and edge ids representing the walk. For example, a two edge walk would have the form (v1, e1, v2, e2, v3). The walk length is the number of total ids (vertex and edge) in the walk which is 2 * num_walk_edges + 1. We statically declare a walk through our example graph.
// Set up walk through target graph. const size_type walk_length = 7; size_type walk_ids[walk_length] = { 0, 0, 1, 2, 2, 1, 0 };
Next, we need to define the comparator that defines a match. We declare an instance of the default subgraph isomorphism comparator, which is located in mtgl/subgraph_isomorphism.hpp. The default comparator is an example of how a comparator should be written. The user will normally write a custom comparator tailored to their exact needs. The default comparator returns a match if the type of the edge and the types of the source and target vertices from the big graph match those of the currently considered walk edge and if the out degrees of the source and target vertices in the big graph are at least as big as those from the currently considered walk edge from the target graph.
The default comparator takes six template parameters of which the last three are optional. The first three give the graph type, the vertex property map type, and the edge property map type for the big graph. The last three give the same values for the target graph. If the target graph is the same type as the big graph, the last three parameters can be omitted as they will default to the same values as those for the big graph. We omitted the last three template parameters as our big and target graphs are the same type. The parameters passed to the comparator constructor are the vertex and edge type property maps for the big graph and the vertex and edge type property maps for the target graph.
// Declare the comparator to define the matching and the visitor to apply // to matches. si_default_comparator<Graph, vertex_property_map, edge_property_map> si_comp(vTypemap, eTypemap, vTypemapTarget, eTypemapTarget); match_visitor<Graph> mvisit;
We also define the visitor that is applied to matches. The match_visitor is a function object whose () operator takes a reference to a dynamic_array of the big graph's size_type. The array represents a single match to the target graph. The match is given as a sequence of alternating vertex and edge ids just like the walk passed to the subgraph isomorphism algorithm. We define a simple match comparator that just prints out the edge ids for a match to the walk in the target graph.
template <typename Graph> class match_visitor { public: typedef typename graph_traits<Graph>::size_type size_type; void operator()(dynamic_array<size_type>& path) { typedef typename dynamic_array<size_type>::size_type d_size_type; printf("Path:"); for (d_size_type i = 1; i < path.size(); i += 2) printf("%6lu", path[i]); printf("\n"); } };
The subgraph_isomorphism() function requires six parameters: the big graph, the target graph, the walk ids array, the length of the walk ids array, the comparator, and the match visitor.
subgraph_isomorphism(g, target, walk_ids, walk_length, si_comp, mvisit);
To compile and run this example on the XMT, issue the following commands from the tutorial directory:
c++ I.. o subgraph_isomorphism1 subgraph_isomorphism1.cpp mtarun m <num processors> subgraph_isomorphism1
Running this example in serial on my desktop, I get the following output:
Big graph (6, 7) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] 3: (2, 3) [3, 3, 2] 4: (3, 4) [2, 3, 3] 5: (3, 5) [2, 1, 1] 6: (4, 5) [3, 2, 1] Target graph (3, 3) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] Path: 0 2 1
The first line printed for the graph gives the graph name and the number of vertices and edges in the graph. The graphs are printed with each edge on a line with the following format
<edge id>: (<source id>, <target id>) [<source type>, <edge type>, <target type>]
Looking at the paths printed by the match visitor, we see that the algorithm returned one match comprised of the edges 0, 2, and 1. The algorithm is designed to return a statistically significant subset of the matches to the walk. In the case of our simple example, it returned only one of the matches.
Filtering Example
Now, let's add the filtering step. This example builds on the previous example, and its complete code is at tutorial/subgraph_isomorphism2.cpp.
The basic steps for using a filtered graph after the big graph, target graph, and walk through the target graph have been defined are:
 Declare an empty subgraph for the filtered graph.
 Define the filter graph comparator.
 Run the filter graph algorithm to populate the filtered graph.
 Declare property maps for the filtered graph types.
 Copy the corresponding big graph types for vertices and edges in the filtered graph to the filtered graph types.
 Define the subgraph isomorphism comparator.
 Run the subgraph isomorphism algorithm on the filtered graph.
 Convert the edge results for the filtered graph into results in terms of the big graph edges.
The filtering algorithm filters the big graph down to only edges that could be possible matches for the target graph edges. This is done by calling the filter_graph_by_edges() function which is defined in mtgl/filter_graph.hpp. The algorithm compares each edge in the big graph to the edges in the target graph. If a big graph edge matches any edge in the target graph based on the comparator, the edge is added to the filtered graph. Nonmatching edges are not added.
Before anything else, you must include the filtering algorithm and the subgraph adapter since the filtered graph will be returned as a subgraph of the big graph. See WrapperAdapters for a detailed discussion of the subgraph_adapter.
#include <mtgl/filter_graph.hpp> #include <mtgl/subgraph_adapter.hpp>
The code is the same as the previous example all the way through setting up the walk through the target graph. At this point, we want to filter the big graph to contain only edges that "match" edges in the target graph.
// Filter large graph to only edges matching edges in target graph. typedef subgraph_adapter<Graph> FiltGraph; typedef graph_traits<FiltGraph>::size_type size_type_filt; FiltGraph filteredG(g); f_default_comparator<Graph, vertex_property_map, edge_property_map> f_comp(vTypemap, eTypemap, vTypemapTarget, eTypemapTarget); filter_graph_by_edges(g, target, f_comp, filteredG);
We define a couple of types and define a subgraph adapter that wraps around the big graph. Note that you pass the graph you want a subgraph of to the subgraph_adapter constructor.
Next, we define an instance of the default filter graph comparator, which is defined in mtgl/filter_graph.hpp. The filter graph default comparator is the same as the subgraph isomorphism default comparator except that it doesn't consider the out degree of the vertices. It only matches on the vertex and edge types. This is an example of the comparator used to filter the graph being different from the comparator passed to the subgraph isomorphism algorithm. The comparators can be the same, or they can be different. It takes the same template and constructor parameters as the subgraph isomorphism default comparator. Again, we only give the first three template parameters because the type of the big graph and the target graph are the same.
Then, we call the filter_graph_by_edges() function. It takes the big graph, the target graph, the comparator, and the result subgraph as parameters.
Now, we get the number of vertices and edges in the filtered graph and define vertex and edge property maps for the filtered graph.
// Create the filtered graph vertex and edge type property maps. typedef array_property_map<int, vertex_id_map<FiltGraph> > vertex_property_map_filt; typedef array_property_map<int, edge_id_map<FiltGraph> > edge_property_map_filt; size_type_filt numVertsFiltered = num_vertices(filteredG); size_type_filt numEdgesFiltered = num_edges(filteredG); int* vTypesFiltered = new int[numVertsFiltered]; int* eTypesFiltered = new int[numEdgesFiltered]; vertex_id_map<FiltGraph> filteredG_vid_map = get(_vertex_id_map, filteredG); edge_id_map<FiltGraph> filteredG_eid_map = get(_edge_id_map, filteredG); vertex_property_map_filt vTypemapFiltered(vTypesFiltered, filteredG_vid_map); edge_property_map_filt eTypemapFiltered(eTypesFiltered, filteredG_eid_map);
Now, we need to get the types of the vertices and edges of the filtered graph which are the types of the corresponding vertices and edges from the big graph. The subgraph adapter provides functions that convert both ways between vertices and edges in the subgraph and their corresponding vertices and edges in the big graph. We will use these functions to copy the big graph vertex and edge types to the property maps for the filtered graph vertex and edge types.
// Copy the vertex types for the filtered graph. graph_traits<FiltGraph>::vertex_iterator verts_filt = vertices(filteredG); for (size_type_filt i = 0; i < numVertsFiltered; ++i) { graph_traits<FiltGraph>::vertex_descriptor fg_v = verts_filt[i]; graph_traits<Graph>::vertex_descriptor g_v = filteredG.local_to_global(fg_v); vTypemapFiltered[fg_v] = vTypemap[g_v]; } // Copy the edge types for the filtered graph. graph_traits<FiltGraph>::edge_iterator edgs_filt = edges(filteredG); for (size_type_filt i = 0; i < numEdgesFiltered; ++i) { graph_traits<FiltGraph>::edge_descriptor fg_e = edgs_filt[i]; graph_traits<Graph>::edge_descriptor g_e = filteredG.local_to_global(fg_e); eTypemapFiltered[fg_e] = eTypemap[g_e]; }
First, we will loop through the vertices in the filtered graph, copying their types from their corresponding vertices in the big graph. The first line in the loop body gets the current filtered graph vertex. The next line gets the corresponding big graph vertex using the filtered graph's local_to_global_v() member function. The final line copies the type value of the big graph vertex to the filtered graph vertex.
The next loop does the same for the edges.
Now, we need to define the comparator that defines a match. This will be a little different from the previous example because the filtered graph type is different from the target graph type. Because of this, we must give all six template parameters when declaring the default subgraph isomorphism comparator.
si_default_comparator<FiltGraph, vertex_property_map_filt, edge_property_map_filt, Graph, vertex_property_map, edge_property_map> si_comp(vTypemapFiltered, eTypemapFiltered, vTypemapTarget, eTypemapTarget); match_visitor<Graph, FiltGraph> mvisit(g, filteredG);
We also define the visitor that is applied to matches. We will once again define a visitor to print out the edge ids for a match to the walk in the target graph. This visitor is more complicated than the previous example because the match the visitor will receive is from the filtered graph. However, we want to print out the edge ids of the original graph. To do this, we need to get the edge descriptors in the filtered graph for the edge ids given in the walk, convert them to their corresponding edge descriptors in the original graph, and get their ids. The following code describes the new match visitor.
template <typename Graph, typename Subgraph> class match_visitor { public: typedef typename graph_traits<Graph>::size_type size_type; typedef typename graph_traits<Subgraph>::size_type size_type_sg; typedef typename graph_traits<Subgraph>::edge_iterator edge_iterator_sg; match_visitor(Graph& gg, Subgraph& sgg) : g(gg), sg(sgg) {} void operator()(dynamic_array<size_type_sg>& path) { typedef typename dynamic_array<size_type_sg>::size_type d_size_type; edge_id_map<Graph> eid_map = get(_edge_id_map, g); edge_iterator_sg edgs_sg = edges(sg); printf("Path:"); for (d_size_type i = 1; i < path.size(); i += 2) { size_type eid = get(eid_map, sg.local_to_global(edgs_sg[path[i]])); printf("%6lu", eid); } printf("\n"); } private: Graph& g; Subgraph& sg; };
To compile and run this example on the XMT, issue the following commands from the tutorial directory:
c++ I.. o subgraph_isomorphism2 subgraph_isomorphism2.cpp mtarun m <num processors> subgraph_isomorphism2
Running this example in serial on my desktop, I get the following output:
Big graph (6, 7) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] 3: (2, 3) [3, 3, 2] 4: (3, 4) [2, 3, 3] 5: (3, 5) [2, 1, 1] 6: (4, 5) [3, 2, 1] Target graph (3, 3) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] Filtered graph (6, 6) 0: (1, 0) [2, 1, 1] 1: (4, 5) [3, 2, 1] 2: (2, 0) [3, 2, 1] 3: (1, 2) [2, 3, 3] 4: (3, 4) [2, 3, 3] 5: (3, 5) [2, 1, 1] Path: 0 2 1
Notice that the filtered graph has one less edge than the big graph. Edge 3 does not match any of the edges from the target graph, so it isn't included in the filtered graph.
Types of Filtering
Besides using property maps, a user could employ other types of filtering. Users are free to put whatever types of matching criteria they desire inside the comparators. Here are some example of types of filtering that might prove useful.
Degree Filtering
One of the easiest type of filtering is to filter by degrees. When searching in the big graph for a match to the target graph, it makes sense to try to match the structure of the target graph. The degree of a vertex indicates something about the structure of a graph, namely the number of edges adjacent to that vertex.
One example of a degree filter is to match a vertex in the big graph to a vertex in the target graph only if the big graph vertex's degree is at least that of the target graph vertex. This prevents a high degree target graph vertex from matching a small degree big graph vertex. Notice that in most cases it doesn't make sense to match in the other direction, matching only if the big graph vertex's degree is less than or equal to that of the target graph vertex. Since the target graph is generally much smaller than the big graph, the vertices in the target graph usually have fewer vertices than their counterparts in the big graph.
Topological Order Filtering
You could also filter based on topological ordering. MTGL includes code that performs a topological sort. Unfortunately, the topsort code has not been update yet to work with the current version of MTGL, so this type of filtering is not available. This section will be completed when the code is fixed.
Target Graph Walk Generation Example
Ideally, the walk would touch every edge and vertex in the target graph. That will allow for the best match as all the information contained in the target graph will be matched against. MTGL provides two algorithms for automatically generating a walk through a graph: random walk (mtgl/random_walk.hpp) and euler tour (mtgl/euler_tour.hpp).
Now for some definitions. An undirected graph is connected if a path exists from each vertex in the graph to every other vertex in the graph. A directed graph with this property is strongly connected. If a directed graph's underlying undirected graph is connected, the directed graph is weakly connected. Ideally, a target graph would be either a connected undirected graph or a strongly connected directed graph as a walk through either of those types of graphs has the potential to touch every vertex and edge in the graph.
MTGL provides some help if your target graph is not a connected undirected graph or a strongly connected directed graph. If the target graph is a weakly connected directed graph, it can be made strongly connected by creating a new graph that contains each edge from the original graph and its reverse. The duplicate_adapter (mtgl/duplicate_adapter.hpp) is an easy way to do this. The duplicate adapter is a wrapper adapter that takes an existing graph and creates a new one that has two copies of every edge in the original graph. For undirected graphs each edge is just duplicated. For directed graphs each edge is duplicated, but the duplicate's direction is the reverse of the original edge's direction.
But wait. Creating a walk through the duplicate graph for a directed graph allows reverse edges in the walk. That's okay because the subgraph isomorphism algorithm can handle reverse edges in a directed graph. The direction of the edge in the walk is specified by the order the edge's vertex ids are given in the walk. For example, consider edge e1 (v1 > v2). If the edge is given in the walk as (... v1, e1, v2 ...), it's a forward edge. If it is given as (... v2, e1, v1 ...), it's a reverse edge. For directed graphs, the algorithm will only match forward edges in the walk to forward edges in the big graph and will only match reverse edges in the walk to reverse edges in the big graph.
Sinks are another problem that can be solved by the duplicate adapter. If a walk algorithm encounters a sink, it has nowhere to continue going. Undirected graphs have no sinks. You can ensure a directed graph has no sinks by wrapping it with the duplicate adapter.
If your target graph is undirected and not connected or is directed and not at least weakly connected, you will not be able to get a walk through the entire target graph. You can generate a walk through one of the components, but the walk will not touch any vertices or edges outside of that component. In this case, subgraph isomorphism will find a match to the single component instead of the entire target graph.
The basic steps for generating a walk through the target graph are:
 Wrap the target graph in a duplicate adapter, if necessary.
 Define the length of the walk.
 Run the walk algorithm.
 If a duplicate adapter was used, translate the walk edge ids back into terms of the big graph.
For the next example, we will show how to create a random walk through the target graph. This example builds on the previous example, and its complete code is at tutorial/subgraph_isomorphism3.cpp. The only part of the code from the last example that is changed is the part that declares the walk through the target graph. Because our example target graph is a weakly connected directed graph, we will use the duplicate adapter.
To be able to use the random walk algorithm and the duplicate adapter we need to include them as follows:
#include <mtgl/duplicate_adapter.hpp> #include <mtgl/random_walk.hpp>
Here is code that generates a random walk through the target graph.
// Create duplicate of target. duplicate_adapter<Graph> duplicate_target(target); size_type num_walk_edges = 2 * num_edges(duplicate_target); size_type walk_length = 2 * num_walk_edges + 1; size_type* walk_ids = new size_type[walk_length]; // Find random walk through duplicate. random_walk(duplicate_target, get_vertex(0, duplicate_target), walk_length, walk_ids); // Replace the duplicate edge ids with the original ids. for (size_type i = 1; i < walk_length; i += 2) { if (walk_ids[i] >= numEdgesTarget) walk_ids[i] = numEdgesTarget; }
First, the code declares the duplicate of the target. Then, it defines a walk length and the walk ids array. Notice that we are using a walk length that will have twice as many edges as the number of edges in the duplicate. This is because a random walk doesn't guarantee that a walk of length the number of edges in a graph will touch every edge in the graph. We use a longer walk to give us a better chance of touching every edge in the graph. On the computer I tested the code on, it required a walk length of two times the number of edges to touch every edge in the target graph.
Next, the code calls the random walk algorithm on the duplicate. The call to random_walk() takes has four parameters. The first parameter is the graph. The second is the starting vertex. The third is the total number of edges and vertices in the walk, i.e. the length of the walk. The fourth is the result array that needs to be of size walk length. The result is an array of integers that represents the ids of the vertices and edges in the walk. The ids are ordered like this: (v1, e1, v2, e2, v3 ... v(n), e(n), v(n+1)), where n is the number of edges in the walk.
Finally, the code translates the ids in the walk back to their corresponding ids in the original target graph. Vertex ids and forward edge ids in the duplicate graph are the same as in the original graph, so they will remain the same. Reverse edges created by the duplicate adapter have the id of m + i where m is the number of edges in the target graph and i is id of the reverse edge. The code starts at position 1 in the walk ids array and touches every other entry in order to check the edge ids. Notice that the edge id is modified only if it is greater than or equal to the number of edges in the target graph.
To compile and run this example on the XMT, issue the following commands from the tutorial directory:
c++ I.. o subgraph_isomorphism3 subgraph_isomorphism3.cpp lprand mtarun m <num processors> subgraph_isomorphism3
We have to include the prand library because the random walk algorithm uses it.
Running this example in serial on my desktop, I get the following output:
Big graph (6, 7) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] 3: (2, 3) [3, 3, 2] 4: (3, 4) [2, 3, 3] 5: (3, 5) [2, 1, 1] 6: (4, 5) [3, 2, 1] Target graph (3, 3) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] Duplicate target graph (3, 6) 0: (1, 0) [2, 1, 1] 1: (2, 0) [3, 2, 1] 2: (1, 2) [2, 3, 3] 3: (0, 1) [1, 1, 2] 4: (0, 2) [1, 2, 3] 5: (2, 1) [3, 3, 2] Found random walk. 3: (0, 1) 2: (1, 2) 5: (2, 1) 2: (1, 2) 5: (2, 1) 2: (1, 2) 1: (2, 0) 4: (0, 2) 5: (2, 1) 0: (1, 0) 3: (0, 1) 0: (1, 0) Replaced walk ids. 0: (0, 1) 2: (1, 2) 2: (2, 1) 2: (1, 2) 2: (2, 1) 2: (1, 2) 1: (2, 0) 1: (0, 2) 2: (2, 1) 0: (1, 0) 0: (0, 1) 0: (1, 0) Filtered graph (6, 6) 0: (1, 0) [2, 1, 1] 1: (4, 5) [3, 2, 1] 2: (2, 0) [3, 2, 1] 3: (1, 2) [2, 3, 3] 4: (3, 4) [2, 3, 3] 5: (3, 5) [2, 1, 1] Path: 0 2 2 2 2 2 1 1 2 0 0 0
Notice the duplicate of the target graph. The next section lists the random walk edges, which are from the duplicate target graph, in the following format:
<edge id>: (<source vertex id>, <target vertex id)
Notice that the vertex ids are all the same as the ones in the target graph, but the edge ids are sometimes different. The next section lists the edges in the random walk after the edge ids have been replaced with those from the target graph. This shows why we used a walk with twice the number of edges in the duplicate target graph. If we had used the number of edges in the duplicate target graph (6), we wouldn't have touched edge 1.
To use an euler tour, the only differences are to include the file with the euler tour algorithm instead of the file with the random walk algorithm and to call the euler tour algorithm instead of the random walk algorithm. The euler tour algorithm takes the exact same parameters as the random walk algorithm. Unfortunately, both walk algorithms are currently implemented using serial algorithms. This will generally not be an issue because target graphs are usually small.
Random Walk
The random walk algorithm starts at the source vertex passed to it. When it visits a vertex, the algorithm randomly chooses one of the outgoing edges adjacent to that vertex. It then visits the vertex at the other end of the chosen edge. One issue with the random walk algorithm is that the length of the walk often has to be many times the number of edges in the graph to be sure that every vertex and edge is touched.
Euler Tour
The euler tour code utilizes Hierholzer's algorithm to create an euler tour. The code will generate a walk of any length where lengths not multiples of the number of edges in the graph return zero or more euler tours ending with a partial euler tour. The algorithm returns a boolean indicating if the graph was eulerian. If the graph wasn't eulerian, no path is returned by the algorithm.
An undirected graph is eulerian if the graph is connected and every vertex has even degree. A directed graph is eulerian if the graph is strongly connected and the indegree of every vertex is equal to its outdegree. Any connected undirected graph can be made Eulerian by creating a new graph with two copies of each edge in the original graph. Any weakly connected directed graph can be made Eulerian by creating a new graph that contains each edge from the original graph and its reverse. The duplicate_adapter (mtgl/duplicate_adapter.hpp) is an easy way to do this for both undirected and directed graphs.
Custom Comparator Example
The filtering and subgraph isomorphism algorithms take a function object as a parameter that defines what a match means. The only requirement is that the () operator have the following definition:
bool operator()(const edge& g_e, const edge_trg& trg_e, Graph& g, TargetGraph& trg) {}
The filtering and subgraph isomorphism algorithms define default comparators that you can use, or you can define your own. The two algorithms can use the same comparator or different ones. Any properties that you want to use in the comparator should be passed in via the comparator's constructor and stored in the comparator. The filtering and subgraph isomorphism algorithms handle reverse edges internally, so the user doesn't need to handle that in the comparator. The two edges should always be compared in the forward direction.
Let's define a new comparator that we will use for both the filter graph and subgraph isomorphism algorithms. This example will build on the previous example. The complete code for this example is at tutorial/subgraph_isomorphism4.cpp.
First, we add a new edge property, flow, to the graph. The edge flow is shown as the second number in the rectangles by the edges. The first number is the edge type. All other markings remain as before. Here is the big graph.
Here is the target graph.
We match based on types as before, but we also add the requirement that an edge in the big graph must have at least as much flow as the edge in the target graph to match. There is only one match to the target graph as shown below.
Now, let's define the comparator that will use edge flow.
template <typename Graph, typename vertex_property_map, typename edge_property_map, typename TargetGraph = Graph, typename vertex_property_map_trg = vertex_property_map, typename edge_property_map_trg = edge_property_map> class type_flow_comparator { public: typedef typename graph_traits<Graph>::vertex_descriptor vertex; typedef typename graph_traits<Graph>::edge_descriptor edge; typedef typename graph_traits<TargetGraph>::vertex_descriptor vertex_trg; typedef typename graph_traits<TargetGraph>::edge_descriptor edge_trg; type_flow_comparator(vertex_property_map gvt, edge_property_map get, edge_property_map gef, vertex_property_map_trg trgvt, edge_property_map_trg trget, edge_property_map_trg trgef) : g_vtype(gvt), g_etype(get), g_eflow(gef), trg_vtype(trgvt), trg_etype(trget), trg_eflow(trgef) {} bool operator()(const edge& g_e, const edge_trg& trg_e, Graph& g, TargetGraph& trg) { // Get vertices. vertex g_src = source(g_e, g); vertex g_dest = target(g_e, g); vertex_trg trg_src = source(trg_e, trg); vertex_trg trg_dest = target(trg_e, trg); // The edge is considered a match to the target edge if its type triple // (src vertex type, edge type, dest vertex type) matches the target // edge's corresponding triple. return (g_vtype[g_src] == trg_vtype[trg_src] && g_etype[g_e] == trg_etype[trg_e] && g_eflow[g_e] >= trg_eflow[trg_e] && g_vtype[g_dest] == trg_vtype[trg_dest]); } private: vertex_property_map g_vtype; edge_property_map g_etype; edge_property_map g_eflow; vertex_property_map_trg trg_vtype; edge_property_map_trg trg_etype; edge_property_map_trg trg_eflow; };
This comparator has the same six template parameters as the subgraph isomorphism default comparator. The parameters passed to the comparator constructor are the vertex type, edge type, and edge flow property maps for the big graph and the vertex type, edge type, and edge flow property maps for the target graph. The comparator stores the property maps passed to its constructor in member variables.
The comparison happens in the operator() function. The code first gets the source and target vertices for the big graph and target graph edge. It then compares the source and target vertex types, edge types, and edge flows between the big graph edge and the target graph edge. A match is when the types all match and the flow on the big graph edge is at least as big as the flow on the target edge.
We use the same comparator class for both the filtering and the subgraph isomorphism algorithm, but we must use different instantiations. Here we declare the filtering comparator and call the filtering algorithm where eFlowmap is the property map for the big graph edge flows and eFlowmapTarget is the property map for the target graph edge flows.
type_flow_comparator<Graph, vertex_property_map, edge_property_map> tf_comp(vTypemap, eTypemap, eFlowmap, vTypemapTarget, eTypemapTarget, eFlowmapTarget); filter_graph_by_edges(g, target, tf_comp, filteredG);
Compare that to the declaration of the subgraph isomorphism comparator and the call of the subgraph isomorphism algorithm where eFlowmapFiltered is the property map for the filtered graph edge flows.
type_flow_comparator<FiltGraph, vertex_property_map_filt, edge_property_map_filt, Graph, vertex_property_map, edge_property_map> si_tf_comp(vTypemapFiltered, eTypemapFiltered, eFlowmapFiltered, vTypemapTarget, eTypemapTarget, eFlowmapTarget); match_visitor<Graph, FiltGraph> mvisit(g, filteredG); subgraph_isomorphism(filteredG, target, walk_ids, walk_length, si_tf_comp, mvisit);
To compile and run this example on the XMT, issue the following commands from the tutorial directory:
c++ I.. o subgraph_isomorphism4 subgraph_isomorphism4.cpp lprand mtarun m <num processors> subgraph_isomorphism4
Again, we have to include the prand library because the random walk algorithm uses it.
Running this example in serial on my desktop, I get the following output:
Big graph (6, 7) 0: (1, 0) [2, 1, 4, 1] 1: (2, 0) [3, 2, 5, 1] 2: (1, 2) [2, 3, 9, 3] 3: (2, 3) [3, 3, 2, 2] 4: (3, 4) [2, 3, 9, 3] 5: (3, 5) [2, 1, 4, 1] 6: (4, 5) [3, 2, 6, 1] Target graph (3, 3) 0: (1, 0) [2, 1, 4, 1] 1: (2, 0) [3, 2, 6, 1] 2: (1, 2) [2, 3, 8, 3] Duplicate target graph (3, 6) 0: (1, 0) [2, 1, 4, 1] 1: (2, 0) [3, 2, 6, 1] 2: (1, 2) [2, 3, 8, 3] 3: (0, 1) [1, 1, 4, 2] 4: (0, 2) [1, 2, 6, 3] 5: (2, 1) [3, 3, 8, 2] Found random walk. 3: (0, 1) 2: (1, 2) 5: (2, 1) 2: (1, 2) 5: (2, 1) 2: (1, 2) 1: (2, 0) 4: (0, 2) 5: (2, 1) 0: (1, 0) 3: (0, 1) 0: (1, 0) Replaced walk ids. 0: (0, 1) 2: (1, 2) 2: (2, 1) 2: (1, 2) 2: (2, 1) 2: (1, 2) 1: (2, 0) 1: (0, 2) 2: (2, 1) 0: (1, 0) 0: (0, 1) 0: (1, 0) Filtered graph (6, 5) 0: (1, 0) [2, 1, 4, 1] 1: (4, 5) [3, 2, 6, 1] 2: (1, 2) [2, 3, 9, 3] 3: (3, 4) [2, 3, 9, 3] 4: (3, 5) [2, 1, 4, 1] Path: 5 4 4 4 4 4 6 6 4 5 5 5
The graphs are printed with each edge on a line with the following format
<edge id>: (<source id>, <target id>) [<source type>, <edge type>, <edge flow>, <target type>]
We got the results that were expected as there is only one match to the target graph.
Attachments

BigG.jpg
(24.1 KB) 
added by gemacke 9 years ago.
Example big graph

TargetG.jpg
(13.3 KB) 
added by gemacke 9 years ago.
Example target graph

BigGMatch.jpg
(26.1 KB) 
added by gemacke 9 years ago.
Target graph matches in big graph

UndBigGMatch.jpg
(24.5 KB) 
added by gemacke 9 years ago.
Extra match in big graph when undirected
 BigGFlow.jpg (24.7 KB)  added by gemacke 8 years ago.
 TargetGFlow.jpg (13.5 KB)  added by gemacke 8 years ago.
 BigGFlowMatch.jpg (25.5 KB)  added by gemacke 8 years ago.
 SI Bipartite.jpg (33.7 KB)  added by gemacke 8 years ago.
 SI Colored SS.jpg (40.7 KB)  added by gemacke 8 years ago.
 SI Colored.jpg (29.7 KB)  added by gemacke 8 years ago.