Wednesday, January 4, 2012

An Idea for the Refinement of a TSP Solution

Although I found through experiment that my probabilistic approach to finding a good TSP solution didn't work so well, I did manage to find a polynomial time algorithm that can take an arbitrary solution to the Traveling Salesperson Problem (TSP) and refine it to the optimal solution (with what I have tested so far). I do not know if this polynomial refinement-to-optimality is possible for solutions bigger than 20, as the exponential branch-and-bound solver at my disposal can't do very much given the hardware I have on which to run it.


Now, I think what I have found here MIGHT be awesome. 


At any rate, I can explain what it does.


First, we assume an asymmetric directed graph of cites, wherein not every possible edge exists -- i.e. The number of edges in the graph |E| < |V|^2, where |V| is the number of vertices (cities) in the graph (map of cities). For simplicity, if an edge does not exist from one city to another, the reverse edge also does not exist. It can be quickly checked that this problem is still NP-Complete.


We may take this graph to be a map of roads between cities, and the weights on the edges (roads) to be a score of them based on the distance between the cities and the elevation difference between the cities, i.e. if one city is higher in elevation than another, the edge going from the higher to the lower is going to be scored lower than the edge which goes from the lower to the higher, because it is easier to go downhill. Looking at such a real map of cities and creating a graph from it, by scoring the edges as above, we find a distinct pattern in the optimal solution to the traveling salesman tour of the cities considered:


In some sense, the solution forms a "polygon" in the sense that no loops are created. So, this:


would be a better solution than this:
Provided we are looking at the "actual" city map, this is a fairly good heuristic for "goodness" of a solution, if not a requirement for optimality. Looking at the actual city map is important because the actual euclidean distance between cities is taken into account in the scoring of edges, and so these "twists" show up on the actual map, whereas if we just have the graph information, this intuition would not make sense since placement of the node on paper of the equivalent graph has nothing to do with the edge weight. Again, it should be noted that the graph (city map) upon which we consider is asymetric and incomplete, and so is still worthy of discussion.


To further shed light on this discussion, it will be assumed that the optimal tour through a given number of cities will be represented by a list of cities, with the order that the city appears in the list being the order of its visitation on the TSP tour. I now take from an excerpt of an email I sent to my CS professor Dr. Tony Martinez on the subject.

To refine a given approximate solution, I observed (as explained above) that approximate solutions often have "twists" in them - that is, they were not polygons since their vertices crossed. So I take the solution and try untwisting any arbitrary segment of the solution (that is, reversing that segment - e.g. take the segment # 2, 3, 4, 1, 5 and change it to  # 5, 1, 4, 3, 2) and see if that operation improves the score. If it does, I return the improved route, else I return the route the untwisting method was given. [author's comment: if done properly, this part of the algorithm can be done in O(n^2) time.] 
Now (empirically) I have observed that the approximate solution is now a polygon (no twists). But it may not be the "right" polygon. To try a different polygon, I simply take an arbitrary subsegment of the solution and splice it into a different part of the whole solution. Example: If we have a solution of Cities # 2, 4, 7, 5, 8, 1, we take an arbitrary sub segment of cities # 4, 7, 5 and try the solution where they are inserted in between 8 and 1 to produce # 2, 8, 4, 7, 5, 1. I try all such possible splice combinations, and see if they improve the score, returning any improvements. This search space, though it be search by brute force, is only O(n^3) large (what's more, being a brute force search, this and other improvement spaces given below can be searched in parallel). I find (again, empirically with our CS312 TSP [Framework] code) that this operation changes the "polygon" to a more optimal one.Finally, for sugar, I take this new and improved solution and select two arbitrary cities along its path and swap them, so that a tour which was city # 2, 3, 8, 7, 1, 4 becomes # 2, 1, 8, 7, 3, 4 if we swap the cities 1 and 3, thus changing the order in which they are visited.
I repeat these three steps - untwisting, segment swapping, and city swapping - for as long as they "help", or as long as any one of them return an improved solution. I find that the number of repetitions is usually small - 3 or 4 times for 50 cities (when it is given a best-of-greedy initial approximate solution, which is an algorithm in O(n^2) time).  [author's comment: these were my results when, instead of stopping during untwisting/segment splicing/city swapping at the first improvement found, I tried all available twists/splices/swaps at one time and used the best of those. The results might be higher if the suggested stop-at-first-improvement approach is taken.]
For 20 Cities, I have  verified that it spits out the optimal solution over and over again, and in much faster time than branch and bound. I do not have the resources to try for the optimal solution for more than 20 cities using C# code on my laptop [which is a netbook. Details: Acer AspireOne 532h, 2GB RAM, 1.66 GHz processor, running C# .NET executable under Windows 7 Starter].


Thus, I believe this whole algorithm runs in O(N^4) time, which is certainly not exponential. The inner loop (untwisting, splicing, swapping) can be done in O(N^3) time and O(N) space, and it is entirely possible that in fact this algorithm is in O(n^3*log N) time, more empirical study needs to be done to decide if that is indeed the case. While largely untested for bigger N, in the tests I have run it finds the optimal solution every time for city maps with city counts < 20. I know, it's not a big number, but this may still be a very good refinement of a given solution, if not a way to find the optimal solution. 

I wonder if anyone could try this method of solution refinement and corroborate my results. Thank you all so much for your comments. I hope someone can further this effort by trying this refinement solution out themselves and comparing their results to known optimal solutions of the graphs upon which they are testing this refinement solution.

Acknowledgments: The code for the Branch-and-Bound solver I used is authored by Jimmy Hales. The C# framework out of which I performed the empirical tests was given to me by the CS Department of Brigham Young University as part of a class assignment for CS312, mentioned in the above letter excerpt as the CS312 TSP (Framework) code.