1. Draw the full decision tree for the parity function of four Boolean attributes, A, B, C, and D. Is it possible to simplify the tree?
6. Consider the following set of training examples (see table in the book).
(a) Compute a two-level decision tree using the greedy approach described in this chapter. Use the classification error rate as the criterion for splitting. What is the overall error rate of the induced tree?
(b) Repeat part (a) using X as the first splitting attribute and then choose the best remaining attribute for splitting at each of the two successor nodes. What is the error rate of the induced tree?
(c) Compare the results of parts (a) and (b). Comment on the suitability of the greedy heuristic used for splitting attribute selection.
10. While the .632 bootstrap approach is useful for obtaining a reliable estimate of model accuracy, it has a known limitation 1127]. Consider a two-class problem, where there are equal number of positive and negative examples in the data. Suppose the class labels for the examples are generated randomly. The classifier used is an unpruned decision tree (i.e., a perfect memorizer). Determine the accuracy of the classifier using each of the following methods. The holdout method, where two-thirds of the data are used for training and the remaining one-third are used for testing. Ten-fold cross-validation. The .632 bootstrap method. From the results in parts (u), (b), and (c), which method provides a more reliable evaluation of the classifier’s accuracy?
12. Let X be a binomial random variable with mean N*p and variance N*p*(1−p). Show that the ratio X/N also has a binomial distribution with mean p and variance p*(1 − p)/N.
6. (a) Suppose the fraction of undergraduate students who smoke is 15% and the fraction of graduate students who smoke is 23%. If one-fifth of the college students are graduate students and the rest are undergraduates, what is the probability that a student who smokes is a graduate student?
(b) Given the information in part (a), is a randomly chosen college student more likely to be a graduate or undergraduate student?
(c) Repeat part (b) assuming that the student is a smoker.
(d) Suppose 30% of the graduate students live in a dorm but only l0% of the undergraduate students live in a dorm. If a student smokes and lives in the dorm, is he or she more likely to be a graduate or undergraduate student? You can assume independence between students who live in a dorm and those who smoke.
16. (a) Demonstrate how the perceptron model can be used to represent the AND and OR functions between a pair of Boolean variables.
(b) Comment on the disadvantage of using linear functions as activation functions for multilayer neural networks.