# Experiment 9: Decision Tree
This is the report of Experiment 9: Decision Tree.
# Purpose
In this experiment, we want to classify the given wine dataset with a decision tree.
I chose the ID3 algorithm, which decides whether it's a leaf node with information gain.
# Procedure
# generate training set and testing set
We use 10-fold cross validation to show the reliability and robustness of the decision tree.
# calculate entropy and information gain
Shannon Entropy is defined as:
The information gain is defined as follows:
# Pruning
We use a pre-pruning strategy, which set a threshold value of information gain called . We set . When the information gain is less than , the set will not be classified.
# results
right:394
wrong:94
accuracy: 0.8073770491803278
right:359
wrong:129
accuracy: 0.735655737704918
right:366
wrong:122
accuracy: 0.75
right:401
wrong:87
accuracy: 0.8217213114754098
right:424
wrong:64
accuracy: 0.8688524590163934
right:367
wrong:121
accuracy: 0.7520491803278688
right:350
wrong:138
accuracy: 0.7172131147540983
right:358
wrong:130
accuracy: 0.7336065573770492
right:398
wrong:90
accuracy: 0.8155737704918032
right:405
wrong:83
accuracy: 0.8299180327868853
The mean accuracy is about , which is close to the experimental expectation.
# About Visualizing the Tree
Unfortunately, I don't know how to visualize a tree with more than 100 nodes, whose information also needs to be presented. So I gave up visualizing it.