Where do two linked lists mergeThe following question was raised in a usenet group: Is there any efficient way of finding the intersection point of two singly linked lists? That is, given two singly linked lists that meet at some point, find the node where they meet. It turns out that the question as stated conceals two assumptions. The first is that the two lists terminate with a common null link. If they do they will have a single merge point; the two merging lists form a Y structure. However there are two ways a linked list can terminate, either by a null link or by a cycle. When the two lists terminate in a cycle each list can join the cycle at different nodes. Each is a legitimate merge node, i.e., we can follow A until it reaches B’s merge point or we can follow B until it reaches A’s merge point. There is no single “the merge point”. We can remove the ambiguity by adopting some criterion for the “best” merge point, e.g., the one that minimizes the combined lengths to the merge point, although this fails if the two merge points are on exact opposite sides of the cycle. The second assumption is that the two lists have any nodes in common, i.e., that they actually merge. Algorithms that fail if the lists are terminated by cycles are unsafe for cycles. Algorithms that fail if the lists have no nodes in common are unsafe for failure to merge.
I. The simple approach for null terminated listsWhat we want to find is the first node the two lists have in common. Let A.m and B.m be the number of nodes respectively between the start of A and the merge node and B and the merge node. Let m be the difference between A.m and B.m. If we skip the first m nodes of the longer list than we can compare the two lists node by node until we get to a match, the merge node. Thus we can find the merge point if we can find m. Since the two lists have the same nodes in common after the merge point we have m = A.length - B.length.The advantage of this approach is its simplicity. However it does require knowing the list lengths. If this information is not available in advance, the lists must be traversed in their entirety. The simple algorithm is unsafe for cycles (traversing the lists to find end points goes into an infinite loop) but it is safe for failure to merge (the comparison loop reaches the terminating null node.)
II. Finding the entry point into a terminating cycleFinding a cycle in a linked list is a classic algorithm. Let A be the list. Let p1 and p2 be two pointers into A. Initialize each to point to the start of the list. In each iteration of a loop advance p1 by one node and p2 by two nodes; escape from the loop when p1 and p2 both point to the same node. Call the node x; it is a node within the list cycle. The next step is to find the length of the cycle. In each iteration of a second loop advance p1 by one node until again points to x. The number of iterations is the length of the cycle. Let k be the length of the cycle. Now initialize p1 and p2 to the start of the list. Advance p2 by k nodes. In a loop advance p1 and p2 by one node in each iteration; escape from the loop when p1 and p2 point to the same node. Call the node y. Then y is the entry point into the list cycle. If it is not known whether the lists are terminated by a cycle or by a null node, we can add a test in the first loop to see if p2 detects a null node.
III. An algorithm based on cycle detectionWhen there is a terminating cycle there are three possibilities – (a) A and B merge before they reach the cycle, (b) they merge at a common entry point, or (c) they enter at different points in the cycle. To distinguish these cases we need to find the entry points for A and B. If the entry points are the same treat the common entry point as a list terminator and apply the simple algorithm in part I. If they differ we have case (c) – either entry point is a legitimate answer. If the two lists are null terminated the cycle detection algorithm will discover the fact; the basic algorithm of part I can be used. In case (c) a complete implementation checks that the two lists actually share a common cycle. Clearly they don’t if the two cycle lengths are different. If they are the same commonality can be done by checking that the entry point of A can be reached from the entry point of B. This algorithm is O(n) but it is cumbersome and relatively expensive. However,properly implemented, it is safe against cycles and against failure to merge.
IV. A method that doesn’t require finding the terminatorThe basic algorithm doesn’t actually require knowing the list length; what it needs is the distance between two nodes that match. Call that distance m. It turns out that m can be found without searching to the ends of the lists. To see how that this works let us first suppose that we know a bound M for m, and that A is the longer list. We compare the first element of B with the first M elements of A. If we find a match we’ve found m (and the merge point.) If no match is found we go to node M+1 in B and check it against nodes M+1 through 2*M in A; we keep advancing by M nodes until a match is found. A match will be found when the merge point is passed. The difference in positions of the match in A and B yields m. Unfortunately we don’t have a bound M for m nor do we know which list is longer. However we can use M=1 in the first iteration, M=2 in the second iteration, and so on. We solve the problem of not knowing which list is longer by alternating which list is the lead list. With a bit of bookkeeping we get the following algorithm: Traverse A by one step. Traverse B by three steps, checking to see if it there is a match with the last element read from A. Traverse A by five more steps, checking for a match with the last element read from B. Continue, alternating lists and increasing the steps by two each time. When a match is found this way we know the difference of distances to the merge point, m. This gives us an O(m^2 +D) method where D is the common distance to the merge point. If the lists are null terminated it is possible that one of the lists is exhausted before the a match is found. In that case we revert to algorithm I. If the lists are terminated by a common cycle algorithm IV will find one of the entry points. Algorithm IV will fail if the two lists are disjoint and are terminated by separate cycles.
Final remarksThis problem is more a puzzle than a serious programming problem. A ‘correct’ solution depends on the objective and upon context. If the lists are known to be null terminated algorithm III is not needed. In principle algorithm IV is more efficient than algorithm I; however it is more complex and in practice simplicity may trump computational efficiency. If the lists are known to be cycle terminated algorithm I will fail. Algorithm III is guaranteed to work but is cumbersome; algorithm IV will be considerably more efficient, but will fail if the lists are disjoint. Considered as a puzzle, algorithm I is the ‘correct’ solution. It is simple and elegant. This page was last updated February 1, 2008. |