Ticket #4245 (new defect)

Opened 2 years ago

Last modified 2 years ago

miniTri nondeterminism

Reported by: jberry Owned by:
Priority: normal Milestone: 1.2
Version: 1.1.1 Severity: normal
Keywords: Cc:

Description

./miniTri er 10 0.5 2

generates nondeterministic results. The largest k-count is 8, but the count itself is either 30 or 31 (./configure --with-openmp, running on my machine).

Change History

comment:1 Changed 2 years ago by jberry

This isn't specific to miniTri; it's an issue with find_triangles itself. The latter has tricky behavior in the presence of multiple edges. We have tolerated this in the past, but I now have a small example showing that perhaps we shouldn't tolerate it anymore. The example is a subgraph that will contain different numbers of triangles depending only on vertex degrees in the original graph. Consider:

p sp 4 6
a 1 2 0
a 2 1 0
a 1 3 0
a 3 1 0
a 2 3 0
a 1 4 0

This graph will have two triangles since vertex 1 has high degree, giving its edges to the buckets of vertices 2 and 3. Our dimacs graph reader (into an undirected graph) interprets 'a 1 2 0' and 'a 2 1 0' as multiple edges. By accident of degree, however, there are only two multiple triangles (multiple triangles are two or more 3-cycles incident on the same three vertices).

Now delete the edge (1,4) and add four other edges (2,4), (2,5), (3,6), and (3,7). The vertices 2 and 3 now have higher degree than vertex 1. The triangles algorithm now finds 4 triangles. Thus, the induced subgraph on vertices (1,2,3) contains two triangles in the first graph, but 4 triangles in the second one.

This goes beyond inconvenience; we should do something. One thing might be the following: by default, don't count multiple triangles. When there are multiple edges between vertices u and v, only put the edge of lowest id into the winner's bucket (noting that edge id's are not consistent from multi-threaded run to multi-threaded run, meaning that we're selecting a random multiple triangle to keep, so we still don't have repeatability in a multigraph). The typical user probably won't need to distinguish multiple triangles. We could document and allow the current behavior for those who wish to see some (but maybe not all - see above) multiple triangles. But I don't like this solution.

Another option would be to disallow this algorithm to run on non-simple graphs. We should think about that because many large, real graphs are non-simple, most of the triangle information would still be useful, and it could be quite painful to require making a simple version of every huge graph. We would need the graph constructors to enable a simple() predicate, and find_triangles would just require that to return 'true'. This would be infuriating to users if it required them to store simple versions of their multigraphs. We would alter the readers to produce simple graphs on demand. That way different runs could build graphs simple graphs or multigraphs from the same input.

Note: See TracTickets for help on using tickets.