CS-310 Homework #3

A completely different measurement for union-find algorithms is to measure the maximum number of collapsing finds (i.e. find operations that change the parent of a node) on a universe of n elements can occur before the tree is of height 1. Note that this allows the interspersing of union and find operations.

Union-Find is an algorithm for managing disjoint subsets of information. For the set of sets, there are two possible operations:

Find: Determine the largest set that an element is a member of
Union: Join two sets together

A reasonably efficient data structure to use for these operations is a set of trees whose leaves are indexed. The root of each tree is the containing set. Find climbs the tree to the root. Union makes one tree a child of another.

When Unioning two trees together there is a choice as to which tree to be the parent. Since minimizing overall depth is desirable, the "smaller" tree is chosen. There are two common options for evaluating the size: the number of nodes in the tree and the maximum depth of the tree.

A common optimization of tree-based Union-Find is "path compression" wherein while climbing the tree during a find, all intermediate nodes are eliminated by making all the nodes traversed direct children of the root. Because the function of the algorithm is to find the largest containing set, no pertinent information is lost.

Show that if height-based union is used, there is a set of operations which gives Ω(n²) collapsing finds before the tree is of height 1.

Note: The first attempt at answering this failed and after discussion with Jeff Green on the fact there are series of unions and finds such that elements must be searched multiple times to have a tree of height 1, I came up with the following solution.

Consider the following progression:

Each iteration, the collapsed tree (which has a height of one) is made a child of a two node tree (also of height one). The size of the collapsed tree grows by two nodes each time meaning that the iteration will continue $\frac{n}{2}$ times. The number of finds accumulated will be:
$\begin{matrix} 2 \sum_{i = 1}^{\frac{n - 2}{2}} 2 i - 3 & ≊ & 2 \sum_{i = 1}^{\frac{n}{2}} 2 i & = & 4 \frac{(\frac{n}{2} + 1) \frac{n}{2}}{2} \\ ≊ & 2 {(\frac{n}{2})}^{2} & = & \frac{n^{2}}{2} & = & Ω (n^{2}) \end{matrix}$
Show that if size-based union is used, the maximum number of finds before the tree is of height 1 is O(n log n)

For the purpose of this proof, consider a family of iteratively constructed trees of maximum height for a given number of nodes constructed with a size-based union. New trees are only formed as the combination of other trees, so a tree with the minimum number of nodes for given depth must have been formed from two other trees of a minimum number of nodes. Otherwise, there would be another tree with fewer nodes for that depth and the given tree wouldn't be minimal.

The simplest class of tree is a single node, t₀. Two of these trees can be combined into a tree t₁.

Two t₁ trees can be joined into a new class of tree, t₂:

The same process can then be used to join two t₂ into the class t₃:

And so on for t₄, then t₅:

Each tree is composed of two of the previous level. Each iteration adds another level to the tree, so the number of nodes in tree t_i is 2ⁱ and the height of that tree is i. For a given number of nodes n, the largest class of tree that can be constructed is log₂n.

Consider then the effect of path compression on a search on the deepest node in a tree of class t₄:

The compressed tree then will be:

Consider the same transition, but with the edges adjacent to the root numbered by the maximum depth that the branch goes to:

The compressed tree then will be:

Every tree belonging to a class t_i will have a single deepest node and will decompose under path compression from that node into a set of trees {t₀, t₀, t₁, …, t_{i - 1}} at a find cost of i (since the tree was of height i). So the decomposition cost for a tree is:
$\sum_{i = 0}^{{log}_{2} n} 2^{i} ({log}_{2} (n - i)) = O (\log^{2} n)$
When using size-based find each union of size k will join two trees of at worst O(log²n). There can be at most $\frac{n}{2}$ joins.
$\frac{n}{2} {log}^{2} n = O (n \log n)$

Consider an alternative approach to the union-find problem. Keep trees of uniform height 2. Each root has between 1 and $\frac{n}{log n}$ children. Each child of the root has between 1 and $2 log n$ children. A child of the root is called "full" if it has ≥ $log n$ children. Every tree has at most 1 non-full child.

To perform union, take the tree with the fewest full children and move its full children to the other tree. Then for the non-full children of each tree, move the children of the non-full child with the lowest degree to the other non-full child. Then move that non-full child to the tree the full nodes were moved to.

Show that the total number of moves of children in non-full nodes is O(n log log n).

For a node with degree k, the maximum number of nodes that could have been moved to form that node is $\frac{k}{2}$ . The maximum size at which a node will qualify as non-full and be combined with another node is $log n - 1$ . The combination of two of these sets will produce a tree of size $2 log n - 2$ . The number of equal size joins to produce a set of size $2 log n - 2$ is ${log}_{2} (2 log n - 2)$ . The number of moves in combining these maximal sets is:
$\begin{matrix} \frac{n}{2 log n - 2} \sum_{i = 0}^{{log}_{2} (2 log n - 2)} 2^{i} & = & O (n log log n) \end{matrix}$

That is certainly a complicated and kinda questionable method. A better one is to ask, what is the maximum number of times an element can be moved?

Each time an element is moved it has to be into a tree of at least as large as the tree it was coming from, so the number of moves is logn. There are n elements, so the number of moves is O(n log n).
Show that the time taken moving full children is O(n).

This is similar to the previous question except the sets being combined are full sets. Posit there is a scenario for combining full sets when dealing with a maximal number of full sets which has ≥ the cost for any other number of full sets. [ToDo: Prove that.]

The maximum number of full sets is when each full set has the minimum number of nodes to qualify as full (logn). Elements in a set beyond its fullness could be used in creating other full sets.

The maximum number of moves of full sets will occur when the sets being joined are of equal size. This will produce a minimum sized set for a given number of moves. If minimum sized sets are created at each iteration, this will maximize the number of iterations.

Non-full sets will have no effect if when combined they result is still non-full. If the result is full then they will produce a less than minimal set for a given number of nodes. For this reason, the worst case will involve combining $\frac{n}{log n}$ full sets.

The total number of moves for combining these sets will be:
$\begin{matrix} \frac{n}{log n} \sum_{i = 0}^{{log}_{2} n} 2^{i} & = & O (n) \end{matrix}$

CS 310: Design and Analysis of Algorithms

Homework #3

Will Holcomb

Due: 13:10 Mon., 11 February 2008