Question/Max Score I (20) II (20) III (15) IV (25) V (20) Total Actual Score Take Home Examination. This is an open book, open notes examination. The examination is due at 10 am on Monday, October 22, 2007. Submit your exam to the instructor by email or submit to the digital drop box on Oak. In either case, please send a separate email to your instructor as soon as you have submitted your exam.
Late submissions will be penalized at the rate of 5 points off for every minute the submission is late. No excuses will be entertained.
Be sure to answer questions clearly. Explicitly state all your assumptions, make sure you have clearly outlined the problem you are solving, and be sure to explain and justify all of the steps in your solution process. We reserve the right to not grade answers that do not address the question asked in the exam.
Your answers to the exam questions must be your own work. You cannot discuss the exam with other students, colleagues, or faculty at the university or elsewhere. You may only consult the instructor or the TA for clarifications and help on the exam. If you use other sources, such as papers and material on the web to derive your answer, please be sure to acknowledge them in a proper way. Do not attempt to download answers to questions directly from the web. It is very important that the work you submit be based entirely on your own efforts.
The honor pledge below must accompany all examination submissions. Otherwise the examination will not be graded.
I pledge my honor that I have neither given nor received aid on this work.
Do not sign until after you have completed your test.
A language L has a four character vocabulary V = {ε , A, B, C}, where ε is the empty symbol. The probability that character vi will be followed by character vj is given by the following matrix:
Do parts (a), (b), and (c) using the uniform cost algorithm. Please draw the search tree to justify your solution.
Find the most likely five-symbol string that starts with ε.
The definition of this question can be misleading because of the definition of ε as the "empty symbol." For a grammar, ε is used to represent the "empty string" which means that the rule does not add any characters to the generated string. So, "εεεAεBCεεAAε" is a five character string because it is equivalent to "ABCAA". The "emptiness" of the empty symbol, however, is meaningful only on a higher level than this question is operating and so it should be counted.
The search tree for this problem space looks like:
Each node represents a potential string to be transmitted. The start node is an empty string. Each depth adds a new character to the string.
A general search would generate all characters at each level. Given knowledge of the goal, there are certain nodes that should never be considered because they can never be on the path to a goal node. The simplest example would be all children of the root except for the one generating ε, since it is the only one that will generate strings starting with ε. The set of possible nodes, n, is therefore constrained to make the search more effective (and simpler to draw). For the search for the most likely five letter string starting with ε this set, m, is:
str(x) is the symbol string generated starting at the start node and traveling the path to x. Because this is a tree and edges always generate symbols, the path to x will be unique.
An additional modification to the algorithm is also needed. In general, for a path, P, from p1 to pn with edge costs, c(ni, nj):
This is not a good method for combining probabilities. A more effective method would be to make the cost of a node, n, be the likelihood that str(n) is not generated.
In general, if the probability of an event happening is p, the probability of the event not happening is 1 - p. An equation for the cost of a node then would be:
Since the edges represent probabilities:
Since there are no negative edge costs, uniform cost search should find the optimal path by constantly examining the node with the current least cost path from the root.
There are 24 nodes expanded before the goal is found. I won't attempt to draw the search tree because it would be fairly large, but one could be constructed from this output (search_state.ecbae.out). The set of closed nodes and their accompanying probabilities at the end of the search is:
[(1.0), E(0.25), EC(0.0625), EE(0.0625), EB(0.0625), EA(0.0625), EAE(0.03125), EBA(0.03125), ECB(0.03125), ECBA(0.015625), EBAE(0.015625), EBC(0.015625), EAC(0.015625), ECE(0.015625), EAB(0.015625), EEE(0.015625), EEC(0.015625), EEA(0.015625), EEB(0.015625), EEBA(0.0078125), EEAE(0.0078125), EECB(0.0078125), EBCB(0.0078125), EABA(0.0078125)]
This list is ordered by the point that the node was searched and each entry is the produced string at that point and the accompanying probability of that string. The actual goal node that is found is "εCBAε" with a probability of 0.0078125. (Assuming the initial ε has a probability of 0.25.)
The program to find the solution is contained in a couple different files:
Find the most likely five-symbol string that starts and ends with with ε.
The most likely five-symbol string in general starting with ε ends with ε, so the answer is the same as the previous question, "εCBAε."
In transmitting messages from L, some characters may be corrupted by noise and be confused with others. The probability that a transmitted character vj will be interpreted as character vk is given by the confusion matrix:
Find the message that most likely to have been transmitted given that the string εεBCAAεε is received, and knowing that all messages begin and end with ε.
To do this the probability that a given character will be generated needs to be modified. I'm pretty sure a proper application such that would give correct values for the probability would involve an application of Bayes rule, but a workable method is to simply multiply the value. This is the method in SymbolErrorSearch.java which comes up with the solution EEBCBAEE.
Repeat parts (a), (b), and (c) using backtracking search. Does backtracking search generate the same results as uniform cost search? Compare the number of nodes generated and the storage requirement for the two algorithms.
I am assuming that "backtracking search" is simply depth-first search. Since depth-first search has no guarantee of optimality, it would be necessary for it to search nearly the entire constrained search space (using the node constraining criteria mentioned in part a) to know that it had found the optimal goal.
A best goal cost could be kept and be used to cull paths with a cost greater then that cost.
ToDo: write depth-first search.
Can you think of a reasonable admissible heuristic function h that can be used to generate an A* search to solve the problem in part (c)? State the heuristic, use it to generate the search tree and the solution, and compare results with backtracking and uniform cost search.
A possible heuristic to estimate the cost to a goal is to assume that no errors were made. Find the cost for the substring remainder of the received string.
ToDo: write heuristic.
Prove that an A* algorithm guided by a monotone (consistent) heuristic finds optimal paths to all expanded nodes, i.e., .
A* defines the following metrics:
The basic algorithm is an queue of nodes, OPEN, is maintained. The nodes in OPEN are ordered according to their value for f(n) such that each iteration the node with the minimum f-value is:
The optimality of A* in reaching a goal node was the subject of a previous homework assignment. That proof by induction is:
At every step before A* terminates ∃ n ∋:
- n is on the optimal path to a goal
- A* has found the optimal path to n
- f(n) ≥ f(s)
At the beginning of the search, OPEN is initialized to {s}. For each of the assertions:
- The start node must be on every path, including the optimum one
- The optimum path to s is itself which is what A* examines
- f(s) ≤ f(s)
Assume after the mth expansion step the three properties hold for a node, n ∈ OPEN that is on the optimal path to the goal.
At the (m+1)st expansion step, one of two things must happen:
- n is picked for expansion
- some other node from OPEN is picked for expansion because it currently has a lower f-value.
In case (i), n moves from OPEN to CLOSED, but its new successors will be added to OPEN if they haven't been seen previously or have a better cost than previously seen. At least one of these nodes, n' will be on the optimal path to the goal, since the optimal path goes through n, and must continue.
If (ii) holds, n remains ∈ OPEN, and continues to satisfy all three properties.
If we assume there is not an optimal path to n', then another path to the goal from s through n' does not include n which contradicts our assumption that n is on the optimal path. Therefore, it is clear that the optimal path to the goal through n** must also include n. So from the (m+1)st step n' takes on the role of n.
To prove that f(n') ≤ f(s):
- f(n') = g(n') + h(n') by definition
- g(n') + h(n') ≤ g*(n') + h*(n') because:
- h(n') is admissible and ≤ h*(n')
- g(n') = g*(n'), because n' is on the optimal path
- g*(n') + h*(n') = f*(n') by definition
Assume that a node that is expanded from OPEN is not the end of an optimum path. If that node is a goal then that contradicts the previously proven optimality of A*,
Let be the sequence of nodes expanded by A*. Prove that if h is consistent, then for any .
The A* algorithm in general will always expand the node with the minimum f. To have an element nk ∈ CLOSED ∋ f(ni) > f(ni+1) would have required one of two things:
Consider the game called Connect Four, which is a vertical game of tic-tac-toe played by two. The players alternately drop red and black pieces onto the vertical board. The board is a 7 x 6, and there are 21 red and 21 black pieces.
The first player to get four of his pieces lined up in a row in any direction — horizontal, vertical, or diagonal — wins the game.
Discuss representational issues, and analyze the complexity of the state space. Provide a step by step justification for the results that you derive.
A simple representation would be a 6x7 integer array. Use constants to represent red and black. A more space efficient representation night be two bit field. Number the space starting at the top left and go across the rows numbering the cells. One field would have bits flipped for if a spot is occupied and the other would be flipped if the spot was a black piece. The spaces that are occupied could also be represented as seven numbers, one through six representing the height.
That representation gives a reasonably good feel for the size of the space because other than winning configurations that stop the came, the pieces can be put in an almost arbitrary configuration. The number of states is 26 * 7 - unreachable states because a goal is reached.
Propose a heuristic evaluation function for playing this game. Explain your heuristic function.
The number of pieces in a row. Rows have to be built one element at a time and the more there are, the more chances to win.
Consider the following logic puzzle: In five houses, each with a different color, live five persons of different nationalities, each of whom prefer a different brand of cigarette, a different drink, and a different pet. Given the following facts, the question to answer is: "Where does the zebra live, and in which house do they drink water?"
Discuss different representations of the problem as a CSP. Why would one prefer one representation over another?
The variables would be:
The needed representational facets would be:
Ideally the constraints would be expressible in a language designed for the task. Prolog for example.
Is the following well-formed formula (WFF) valid? Justify your answer using a truth table.
.P | Q | R | P∨Q | Q∨R | (P∨Q)∧(Q∨R) | P∨R | (P∨Q)∧(Q∨R)⇒P∨R |
---|---|---|---|---|---|---|---|
F | F | F | F | F | F | F | T |
T | F | F | T | F | T | T | T |
F | T | F | T | T | T | F | F |
T | T | F | T | T | T | T | T |
F | F | T | F | T | T | T | T |
T | F | T | T | T | T | T | T |
F | T | T | T | T | T | T | T |
T | T | T | T | T | T | T | T |
From the truth table, we see that when P is false, Q is true and R is false:
T ⇏ F, so the proposition is incorrect.
Use propositional logic to represent the following sentence: "When Fido is hungry, Fido barks, but Fido barking does not necessarily mean that Fido is hungry.
Prove that is logically equivalent to .
In minesweeper, a well-known computer game, the world is a rectangular grid of N squares with M invisible mines scattered among them. Any square may be probed by the agent; instant death follows if a mine is probed. Minesweeper indicates the presence of mines by revealing, in each probed square, the number of mines that are directly or diagonally adjacent. The goal is to have probed every unmined square.
Let Xi,j be true iff the square [i, j] contains a mine. Write down the assertion that there are exactly two mines adjacent to [1, 1] as a sentence involving some logical combination of Xi,j properties.
two-mines(X1,1) | = |
(X1,2 ∧ X2,2 ∧ ¬X2,1) ∨ (X1,2 ∧ ¬X2,2 ∧ ;X2,1) ∨ (¬X1,2 ∧ X2,2 ∧ X2,1) |
Generalize the assertion from (a) by explaining how to construct a conjunctive normal form (CNF) sentence asserting k of n neighbors contain mines.
Make a (n - k)!xn matrix where each of the n neighbors of the cell are listed in each of the (n - k)! rows. Make one statement out of the matrix by ∧ing all the elements within the rows and ∨ing the rows together. The rows should them be all the possible permutations of ¬ and ¬¬. The resultant statement will be the proper conjunctive normal form.