CS 4804 Homework #5

Date Assigned: November 3, 2006
Date Due: November 13, 2006, in class, before class starts
  1. (60 points) Pickup the car evaluation dataset from the UCI machine learing repository. The car.names file is like a README, and describes what the dataset is about. The car.data file is the actual dataset. Each line of this file contains one example. Various properties of the car are listed separated by commans followed by the classification at the end of the line. See the car.c45-names for legal values of each attribute and the class. The goal of this problem is to take a set of car features as input and predict whether the acceptability of the car.

    First separate the dataset out into training and test so that the distribution of classes is the same in both. E.g., if the training set had x% of its examples from the first class, y% of its examples from the second class, z% from the third class, etc., these same percentages must be observed in the test set.

    Implement a decision tree algorithm using entropy gain as the tree growing criterion. Reproduce a curve similar to Fig 18.7 (from your textbook) about the performance of the tree as the training set size grows. Besides the performance numbers, make qualitative comments about how your decision tree is doing.

  2. (30 points) Exercise 20.11 from your textbook. But change "two inputs" to "three inputs".

  3. (10 points) Exercise 20.13 from your textbook.

Return Home