Instructions:

If anything is ambiguous or unclear:

  1. Discuss possible interpretations with other students, your TA, and instructor
  2. Make assumptions, state them explicitly, and then use them to work out the problem
  3. Send e-mail to your TA first, and to your instructor if an issue is not resolved to your satisfaction.

Remember that after general discussions with others, you are required to work out the problems by yourself. All submitted work must be your own. Please refer back to the Honor code for clarifications.

Important:

  1. Comparing Multiple Search Methods

    Consider the problem of finding a path in the grid shown below from starting square S to goal square G. Possible moves from a square are: move up, move left, move right, and move down exactly one square. No move may be made onto a dark square (i.e., obstacles), or off the edge of the grid.

    G
    S
    1. Depth-First Search

      Copy the grid and then mark the grid squares with the number(s) indicating when that square is expanded during a Depth-First search from S to G. Assume children nodes are visited in the order up, left, right, down. Assume cycle checking is done so that a node is not generated in the search tree if the grid square position associated with the node occurs somewhere on the path from this node back to the root node. Highlight the solution path found, if any, or explain why none is found.

      This search is performed by a C++ program.

      |17|18|19|20|21|22|23|24|
      |16|31|30|29|28|27|26|25|
      |15|32|GG|  |  |  |  |  |
      |14|  |XX|XX|XX|XX|XX|  |
      |13|XX|02|01|03|XX|  |  |
      |12|11|XX|SS|04|05|XX|  |
      |  |10|09|08|07|06|  |  |
      |  |  |  |  |  |  |  |  |

      The path is the cells marked <01, 03-32>. The cell marked 02 is a dead end and the search backtracks to 01.

    2. Greedy Best-First Search

      Copy the grid again and then mark the grid squares with the number(s) indicating when that square is expanded during a Greedy Best-First search from S to G. Use as the heuristic function h(n) = |xn - xg| + |yn - yg| where the grid square associated with node n is at coordinates (xn, yn) in the grid and the goal node G is at coordinates (xg, yg). Assume you do not generate a node if that node's associated grid position has previously been generated. (That is, the third way to deal with repeated states as described on pages 81-82.) In the case of ties in evaluation function values, for siblings expand them in the precedence order up, left, right, down. In the case of ties between non-siblings, use FIFO order to expand first the node that has been in the NODES list the longest. Highlight the solution path found, if any, or explain why none is found.

      This search is performed by the same C++ program.

      |  |  |  |  |  |  |  |  |
      |  |  |  |  |  |  |  |  |
      |12|13|GG|  |  |  |  |  |
      |11|  |XX|XX|XX|XX|XX|  |
      |10|XX|02|01|03|XX|  |  |
      |09|08|XX|SS|04|  |XX|  |
      |  |07|06|05|  |  |  |  |
      |  |  |  |  |  |  |  |  |

      The path is the cells marked <05-13>. The cells 01-04 are attempted and abandonded for cells with a better heuristic ranking.

    3. A* Search

      Copy the grid again and do the same as in (b) except using A* search with the same heuristic function as in (b) and the cost of each move equal to 1. Use the same tie-breaking rules as defined in (b). Highlight the solution path found, if any, or explain why none is found.

      This search is supposed to be performed by the C++ program, but there's a lesson there about attempting to get a project done on a deadline at the same time as attempting to learn a new language. It might seem a good way to kill two birds with one stone, but can make an otherwise simple project intractable.

      |  |  |  |  |  |  |  |  |
      |  |  |  |  |  |  |  |  |
      |  |18|GG|  |  |  |  |  |
      |16|17|XX|XX|XX|XX|XX|  |
      |14|XX|02|01|03|XX|  |  |
      |13|10|XX|SS|04|12|XX|  |
      |  |07|06|05|08|15|  |  |
      |  |  |11|09|  |  |  |  |

      The path is the cells marked <05,06,07,10,13,14,16,17,18>.

    4. Compare the number of nodes generated in each case.

      The depth first search expands the most nodes because it is wandering blindly until it happens across the node. The greedy search expands fewer than A*, but the cost of A*'s optimality is it has to verify that no shorter path exists.

    5. An Infinite Grid

      Suppose the grid is extended infinitely in all directions but S, G and the dark squares remain as before. Which of the following methods will not be able to find a solution to this problem? Breadth-First, Depth-First, Depth-First Iterative Deepening, Greedy Best-First, and A*. Which of these would be the best method, and why? Again, in the case of ties in ordering nodes on the NODES list, use the precedence defined in (b).

      • Breadth-First — Will find a solution. The nodes at each depth are the cells a given distance from the start position. The solution would be found when the search gets to the appropriate depth.
      • Depth-First — Will not find a solution. Once the search comes around the edge of the cells blocking a straight path, it will go up and continue to go up indefinitely.
      • Iterative Deepening Depth-First — Will find a solution. The search order is the same as breadth-first search and a path will be found for the same reason.
      • Greedy Best-First — Will find a solution if the heuristic is always positive. As the costs accumulate with increasing distance from the start, eventually the search will be driven to the goal.
      • A* — Will find a solution for the same reason as Greedy Best-First.
  2. Informed search on 8-puzzle

    Consider the 8-puzzle and the following heuristics.

    1. h1 — Number of misplaced tiles
    2. h2 — Sum of the Manhattan distances of all of the out of places tiles
    3. h3 — h2 + 2 + number of direct tile reversals.
    1. Are all three heuristics admissible? Are they consistent (monotonic)? Either prove your conclusions for the general case or give a counterexample.

      1. h1 — To get each misplaced tile into its correct position will take at least one move, so this will never be more than the minimum distance to the goal and is thus monotonic. A* when performed with a monotonic heuristic is admissible because it explores pathes in order of increasing length.
      2. h2 — The minimum number of moves to get from some position <i1,j1> to some other position <i2,j2> is the Manhattan diistance, so this heuristic is monotonic and A* is admissible.
      3. h3 — There are boards where only a single tile is out of place and only a single move is needed. Since h3 has a constant of 2 added and h2 cannot be negative, its minimum value is at least 2. Since it will overestimate the distance in these cases it is not monotonic and A* would not be admissible.
    2. What can you say about the informedness of each heuristic function?

      1. h1 — The maximum value of this heuristic is 8 if all 8 tiles are misplaced. This offers realtively small granularity and does not take into account the distance the tile is misplaced. This heuristic is not terrible well informed.
      2. h2 — This heuristic has a larger range and takes into account the amount is disorder in the board. It is better informed than h2.
      3. h3 — It is more informed than h2 because it has a greater range and takes into account tile reversals.
    3. Write code to find the solution from an initial configuration to a goal using:

      1. uniform cost search
      2. best-first search
      3. A* search

      Run this code for at least three initial configurations, and generate a table that includes:

      1. the number of nodes that were placed in OPEN
      2. the number of nodes expanded
      3. the path cost from the start to the goal node

      Briefly discuss your results.

    4. Consider the weighted A* heuristic discussed in Hansen, i.e., (see page 269 of the Hansen and Zhou paper). Choose the heuristic function, h2, and implement the Anytime-WA* algorithm on page 271 of the paper. Run your code to see if this algorithm generates the optimal solution for 2 of the three configurations that you ran earlier. Collect statistics (i)-(iii) for each iteration of the anytime algorithm. Discuss your results, and compare them with the results generated for the A* algorithm in part III.

    5. Prove A* is admissible. Your proof should show that:

      1. A* will terminate
      2. During execution, there is always a node on OPEN that lies on an optimal path to the goal
      3. If there is a path to a goal, A* will terminate by finding the optimal path

      A* is constantly expanding the shortest projected path to the goal. It is admissible if the heuristic function always underestimates the distance to the goal because a path p is only completed when the goal node is at the head of the open queue. P can only be at the head of the queue if it has a cost + heuristic less than or equal to any other node. If there were another goal cheaper than p, it would come up first.

      There has to always be an item from p on the open list because the open list has the heads of the lowest cost paths from the start. One of those has to be a subpath of the lowest cost path to the goal.

    6. Constraint Satisfaction

      1. Explain why it is a good heuristic to choose the variable that is most constrained, but the value that is least constrained in a constraint satisfaction problem (CSP).

        The most constrained variable is the variable that is most likely to get constrained to no possibiltiies and force backtracking. The least constrained value is the value that leaves open the maximum number of possibilities in the subtrees (and so the least chance of backtracking).

      2. Consider the problem of constructing (not solving) crossword puzzles: fitting words into a rectangular grid. The grid, which is given as a part of the problem, specifies which squares are blank and which are shaded. Assume that a list of words is provided and the task is to fill in the blank squares using any subset of the list. Formulate the problem in two ways:

        1. As a general search problem. Choose an appropriate search algorithm and specify a heuristic function, if you think one is needed. Is it better to fill in blanks one letter at a time or one word at a time?

          It would depend on the length of the word list. If there are more than 26 * the number of letters words of each length then the branching factor would be lower trying by letters. In particular if certain combinations of letters could be eliminated as impossible. A heuristic might be based on letter frequencies in the given language and also combination frequencies.

        2. As a constraint satisfaction problem. Should the variables be words or letters?

          It seems like words would make more sense because a letter in a particular space doesn't particularly constrain the rest of the word in a very specific manner. Words however would do that to a higher degree.

      3. Solve the following cryptarythmetic problem by hand using:

        1. backtracking

        2. forward checking

        3. the MRV and least-constraining-value heuristics