CS 4804 Homework #5
Date Assigned: November 3, 2006
Date Due: November 13, 2006, in class, before class starts
- (60 points) Pickup the car evaluation dataset from the UCI machine
learing repository. The car.names file is like a README, and
describes what the dataset is about. The car.data file is the actual
dataset. Each line of this file contains one example. Various properties
of the car are listed separated by commans followed by the classification
at the end of the line. See the car.c45-names for legal values of each
attribute and the class. The goal of this problem is to take a set of
car features as input and predict whether the acceptability of the car.
First separate the dataset out into training and test so that the distribution
of classes is the same in both. E.g., if the training set had x% of its
examples from the first class, y% of its examples from the second class,
z% from the third class, etc., these same percentages must be observed
in the test set.
Implement a decision tree algorithm using entropy gain as the tree growing
criterion. Reproduce a curve similar to Fig 18.7 (from your textbook)
about the performance
of the tree as the training set size grows. Besides
the performance numbers, make qualitative comments about how your
decision tree is doing.
- (30 points) Exercise 20.11 from your
textbook. But change "two inputs" to "three inputs".
- (10 points) Exercise 20.13 from your textbook.