Finding Loops In Cave Surveys

By Larry Fish

 

Most cavers understand the concept of survey loops. On the surface, it appears to be a pretty simple idea: you have a loop in a cave survey when a series of shots circle back to the same starting point. However, in spite of this simplicity, finding loops with a computer program is quite difficult.  In this article, I will discuss the problem of finding loops with computer programs and talk about solutions to the problem.

 

Simple Loops. If you are not picky about which loops you choose, finding a loop is pretty easy. The problem becomes difficult if you want to find the “simplest” set of loops in a cave.  For example, look at the following drawing:

    Figure 1.

 

Here we have a simple pattern of shots that produces three possible loops. The first loop consists of the stations A, B, E, and D. The second consists of B, C, F, and E. The final loop consists of stations A, B, C, D, E, and F.

 

Figure 1A.

 

Now the question becomes, which of these loops are most useful to cave surveyors? To satisfy the needs of surveyors, the set of loops that we choose must satisfy three requirements:

 

1. All Shots. The loops we choose must cover all the shots in the cave that are included in a loop. In this example, every shot is part of one or more loops, so we must include every shot. No one loop covers all the shots but if we choose any two of the loops, all shots will be included. That means one of the three loops is superfluous and can be ignored.

 

2. Smallest Size. The loops we choose should be as “small” as possible. Depending on your needs, “small” could mean the fewest number of shots, the smallest perimeter or the smallest enclosed area. For cave surveyors, choosing loops with the smallest number of shots is probably the best choice. This is because having fewer measurements makes it easier to find blunders. Also, if you have to repair a bad loop in the cave, it is a lot easier to resurvey a small number of shots.

 

3. Minimum Overlap. The loops we choose should also have the smallest possible overlap. The reason for this principle is obvious. The less overlap between loops, the easier it is to isolate and locate blunders. Also, if you are closing loops by distributing errors around a loop, having loops with minimal overlap prevents good loops from being distorted by bad loops.

 

Going back to Figure 1 above, to satisfy the requirements, we would choose the loops (ABEF) and (BCDE), and not (ABCDEF). To clarify why this combination is preferable, look at the follow chart:

 

Loop1

Loop2

Total Length

Overlapped Edges

ABEF

BCDE

8

1

ABEF

ABCDEF

10

3

BCDE

ABCDEF

10

3

 Figure 2.

 

The chart shows how various pairs of loops compare. As you can see, the combination of the two smaller loops produces the shortest total length and the smallest number of overlapped edges. This pair of loops would be preferable for cave surveyors.


Finding The Smallest Loops. With the simple example, it is fairly easy to pick out the smallest loops. However, as you add more and more interconnected loops, the problem becomes complex, even for a computer program. It turns out that the problem is “NP-Complete,” [1, 2] which is a mathematical term that means you have to try every possible combination of loops before you can be sure that you have the best set of loops. Since the number of possible loops goes up exponentially with the density of the graph, the time to solve the problem can be exponential. This could make the problem too time-consuming to be practical for cave survey programs. For example, it might take a few seconds for small cave, hours for a medium sized cave and months for larger caves.

 

Dealing With NP-Complete Problems. It is often possible to deal with NP-Complete problems by finding a faster solution that does not give mathematically perfect results. In the case of cave surveys, after several months of experimenting, I was able to find a practical solution that gives nearly perfect results and runs relatively fast. To save other surveyors and programmers some time, I will outline the method I used in detail.

 

Graph Theory. The branch of mathematics that deals cave surveys and loops is called “Graph Theory.” [7] Graph theory is pretty easy to understand as long as you know the language. For example, in Graph Theory, the stations and shots in a cave survey would be called a “graph.” A “Station” is called a “Vertex” and a “Shot” is called an “Edge.” A Loop is called a “Cycle,” and when you find all the Cycles in a Graph it is called the “Cycle Basis” for that graph. Here is a chart that translates the language of cave surveying to graph theory:

 

Cave Survey

Graph Theory

Station

Vertex

Shot

Edge

Survey

Graph

Loop

Cycle

All Loops

Cycle Basis

Smallest Loops

Minimum Cycle Basis

 

Spanning Trees. The first step toward finding loops in a graph or a cave survey is creating something called a “Spanning Tree.”  In a cave survey language, a Spanning Tree would be a series of shots that touch each station in a cave only once. In other words, every station has no more than one shot leading into it. For example, here is a Spanning Tree for Figure 1 above:

Figure 2.

 

As you can see, the Spanning Tree touches every station in the cave without visiting any station twice. The shots A-D and B-E have been removed from the survey because, otherwise, stations D and E would have two shots leading into them.

 

Multiple Spanning Trees. Another important aspect of spanning trees is that every graph can have more than one valid spanning tree. For example, here in another spanning tree for Figure 1:

Figure 3.

 

As you can see, this spanning tree also has one shot leading into each station but it uses a different set of shots to do it.

 

Finding Loops In Spanning Trees. Another way to think about this idea is to realize that a Spanning Trees is just a cave survey where one shot has been removed from each loop, effectively breaking that loop. That means that all you have to find a loop in a Spanning Tree is to put one of the “missing shots’ back. For example, in the Figure 3 above, if you put the shot “D – E” back, you will have found the “ABED” loop. In other words, for every shot that is missing from the Spanning Tree there is one loop. Thus the “missing shot” is the key to each loop and we will use this shot as a kind of marker for the loops in a cave.

 

To actually figure out which shots belong to the loop, you follow the “parents” of the missing shots back until they come to a common station. (The arrows in the example graphs show which stations are the parents.) For example, in Figure 3, you follow E back to B and B back to A. This defines one side of the loop. On the other side, you follow D back to A. You stop at A because it is the common parent of both sides loop.

 

Problems With Spanning Trees. In the language of Graph Theory, the loops generated using this approach are called “Fundamental Cycles.” Unfortunately, Fundamental Cycles do not satisfy the criteria that we have set out for cave surveying. In graph terminology, what we are looking for is called the “Minimal Cycle Basis” of a graph. Although the previous example seems to work perfectly and does find the Minimal Cycle Basis, this is not always the case. For example, here is another example of a spanning tree:

Figure 4.

 

If you look at the loop that is formed when shot A-D is put back into the spanning tree, you will see it consists of stations A, B, C, D, E, and F. This is the one loop from Figure 1A that we decided was defective.

 

Different Kinds Of Spanning Trees. You can now see that some kinds of Spanning Trees work better than other kinds. As a result, one way to improve our loop-finding ability is to use different kinds of Spanning Trees.

 

Finding Spanning Trees. To understand how to improve Spanning Trees, it is important to understand how they are derived. The process is pretty simple and similar to the way we explore a cave. Generally speaking, you simply “explore” the whole cave and put every shot you find in the spanning tree unless you have seen the station before. The key to this process is how you “explore.” There are two basic methods of “exploring”: “Depth First Search” (DFS) and “Breadth First Search” (BFS).

 

Depth First Search. The Depth First Search explores by going to the “back” of the cave first and then working your way back forward. In other words, you don’t linger at each station and you don’t initially explore all branches. Instead, you pick the first connection at each station and follow it deeper. Here is an example of the a DFS spanning tree for Figure 1:

Figure 5.

In this example we start at A and keep going deeper until we run out of new shots at D. Then we backtrack to E and look for any side branches. There is a side branch running from E to B, but we have already visited B, so we ignore it. This process of backtracking continues until we have checked all side branches and worked our way back to A.

 

You will notice that this gives the same spanning tree that we had in Figure 4, which produced the bad, overlapping loops. For the purpose of finding minimum cycles, a Depth First Search produces poorer loops.

 

Breadth First Search. With Breadth First Search, we explore the front of the cave first. We do this by checking all the branches off each station, before we move any deeper into the cave. For example, here is a Breadth First Search spanning tree for Figure 1.

Figure 6.

In this example, we start at station A as before, but we explore the connections to B and C before moving on. When we do move on to B, we explore its connections too and continue until we run out of stations that have not been visited. You will notice that this gives the same spanning tree as Figure 3, which we concluded, gives superior loops. So for our purposes, BSF spanning trees give superior results.

 

BSF Problems. Unfortunately, as the loops get more complicated, the BSF begins to have problems. For example, here is a more complex survey and the BSF spanning tree it produces:

 

Figure 7.

 

As you can see, the spanning tree will create two sets of overlapping loops, which is just what we are trying to avoid. One way to avoid this problem is to change the way we explore. For example, if we preferentially explore the stations with the most connections to them, it will improve the spanning tree. In the example above, we would choose E first, because it has four shots connected to it.  Next, we would explore B, D, F and H because they have three connections. This is the result:

Figure 8.

As you can see, this produces the kind of minimally overlapped loops that we are looking for. (Note: detailed information on implementing these BFS algorithms is available in Manuel Huber’s article [3] in bibliography.)

 

The Final Problem. Unfortunately, improving the spanning tree can only be carried so far. Eventually, if you have enough loops, even an optimized spanning tree fails to produce a Minimal Cycle Basis:

Figure 9.

In this example, we have a 3x3 “grid” of loops. As you can see, even using the optimized spanning tree algorithm, two of the nine loops in the area of F, G, J, K, N and O will be overlapped. Thus, when you have loops stacked three-deep in two directions, even an optimized spanning tree will generate overlapped loops.

 

A New Strategy. At this point, we are forced to adopt a new strategy for improving loop quality. We know that a modified BSF spanning tree will find a set of loops that almost satisfies our criteria, so all we have to do is fix those loops that are sub-optimal. The basic idea is to find a method of finding and fixing sub-optimal loops. During my experiments, I found several methods that could be used to find bad loops and fix them. The fastest method was to look for “short cuts”.

 

Looking For Short Cuts. To find short cuts, you look at each loop to see if there is a shorter alternate route. The process begins by selecting one station from the “missing shots” and then doing a Breadth First Search of all adjacent stations. If during the course of the BFS, you find the other end of the “missing shot,” you have a short cut. Since the BFS only explores one “level” of depth at a time, the depth of the search tells you the length of any potential short cut. If the depth exceeds the length of the loop your are trying to improve, you know there is no short cut.

 

The process is a bit complicated, so here are step-by-step instructions:

 

1.      Choose one loop and note the length of the loop.

2.      Choose one station from the “missing shot” for the loop.

3.      Do a Breadth First Search of all adjacent shots to this station.

4.      Each time you move deeper in the search, keep track of the parent shot.

5.      Each time you move deeper, keep track of the depth.

6.      If the depth is greater than the length of the loop, there is no short cut for the loop, so stop searching.

7.      If you encounter the other station in the “missing shot,” you have found a short cut.

8.      Create a new loop by collecting all the shots in the short cut. This is done by following the parent shots back to the starting station.

9.      Mark the “missing shot” so it is excluded from all futures short-cut searches.

10.  Repeat steps 1 through 9 until all loops have been checked for short cuts. You can ignore any loop with less than four shots.

 

Note: this algorithm is described in detail in Manuel Huber’s article [3] in bibliography.

 

Loop Order. Each time you find a short cut, you eliminate the “missing shot” from the search. This assures that you do not find any overlapping short cuts. However, it can cause the program to miss certain other valid short cuts. For this reason, the order in which loops are processed is important. There are several different ways to order the loops for processing, but for a variety of reasons, I chose to sort all the loops by length and start with the longest loops first. [3] 

 

Problems With Short Cuts. The “Short Cut” technique works properly for 99% of the loops. However, there are still a few cases where the technique finds sub-optimal loops. This is typical of an NP-Complete problem and we would normally be satisfied with a few sub-optimal loops if the algorithm were fast. In this case, I found a way to improve the technique so it generates nearly 100% minimal loops. (At least in my testing of thousands of random loops, I have yet to find a failure.)

 

The “Short Cut” technique fails in certain situations where you have two overlapping loops that have the same number of shots. Here is an example:

Figure 10.

 

On the left, we have a pattern of shots that forms four minimal loops and nine additional, overlapped loops. In some instances, the Spanning Tree will find the two overlapping loops shown on the right. (This depends on the order of the shots and how the loops are connected to the rest of the survey.) Since both loops have the same length, the program may choose either loop to process first when it starts to look for short cuts. If the program chooses to process the middle loop first, shots H-E or E-F may be prematurely eliminated from consideration as a short cut. This may cause the program to select overlapped loops around stations {A,B,C,D,E,F}.

 

Enclosed Area. My solution was to sort the loops by area whenever I found two loops of equal length. This was a bit more complicated to do than it sounds. While it is easy to find the area of an irregular 2D polygon using the “Surveyor’s Rule” or “Determinants”, the surface area of a 3D polygon is difficult to define and calculate. As a result, I chose to find the center of the polygon (centroid), and calculate the average distance to each vertex. This is relatively fast and easy to calculate and gives good results in testing.

 

Minor Issues. When you combine all the steps outlined above, you have an algorithm that finds minimal loops nearly 100% of the time. In spite of this, there are some situations where the algorithm will find loops that “appear” to be sub-optimal. For example, look at the following diagrams:

Figure 11.

 

Here you have two possible loops in a graph. Both loops satisfy the requirements of having the minimum number of shots and overlapped edges. However, most people would probably choose the loop on the left as the smallest loop because it has a smaller area. For the purpose of cave surveying, there is no advantage to either loop and both would work just fine. You could add an extra step to the algorithm to fix these cosmetic defects, but it is probably not worth the loss of speed and the extra programming effort.

 

Running Time. The running time for this algorithm is relatively fast.  On a 2.8 Ghz Pentium 4, the test program will find 1,000 loops in a dense grid in 175 milliseconds and 2,500 loops in about 800 milliseconds. Considering that Lechuguilla Cave has about 1,800 loops, these are very acceptable times. In mathematical terms, the time is low order polynomial and is as good or better than other algorithms in the literature [4, 5, 6]. (Time in milliseconds = m×n×k1+k2, where k1 = 1.29E-4 and k2 = 0.08.)

 

Loop Closures. One point should be clarified. There is a distinction between finding loops and closing loops. You might assume that in order to close loops, you would have to find loops. Actually, that is not the case. Closing loops only requires that you find junctions and traverses, which is a fairly easy task. The primary value of finding minimum cycles is to help locate blunders.

 

Test Software. In order to experiment with different loop finding techniques, I built a test program that is capable of generating and displaying large sets of random and non-random loops. It will find and display minimal loops using this and other algorithms. It will also display various kinds of spanning trees and you can even save graphs for later analysis. For those who are interested in experimenting with graphs, spanning trees and minimal cycles, the program can be down loaded here:

 

http://www.fountainware.com/download/loopfinder.zip

 

There is no installation program, so you will need to copy the files to a folder and create whatever shortcuts you need. There is a brief help file that explains most of the controls and options. Sources to the main algorithms are included.

 

Bibliography:

 

1. Algorithms for generating fundamental cycles in a graph.

Authors: N. Deo, G. M. Prabhu, and M. S. Krishnamoorthy

ACM Trasn. Math. Software, 8:26-42, 1982.

 

2. A faster algorithm for Minimum Cycle Basis of graphs.

Authors: Telikepalli Kavitha, Kurt Mehlhorn, Dimitrios Michail, and Katarzyna Paluch

The 31st International Colloquium on Automata, Languages and Programming

July 12-16, 2004, Turku, Finland 

 

3. Implementation of algorithms for sparse cycle bases of graphs.

Author: Manuel Huber

 

4. Implementing Minimum Cycle Basis algorithms.

Authors: Kurt Mehlhorn and Dimitrios Michail

WEA 2005: 4th International Workshop on

Efficient and Experimental Algorithms

May 10-13, 2005, Santorini Island, Greece

 

5. Application of Shortest Path Methods.

Author: J.C. de Pina. PhD

thesis, University of Amsterdam, Netherlands, 1995

 

6. A polynomial-time algorithm to find a shortest cycle basis of graph.

Author: J. D. Horton.

SIAM Journal of Computer, 16:359-366, 1987.

 

7. Graph Theory

Author: Reinhard Diestel

Springer-Verlag New York 1997, 2000

Electronic Edition Available Here:

http://www.math.uni-hamburg.de/home/diestel/books/graph.theory/index.html