# Experiment 10: Random Forest

This is the report of Experiment 10: Random Forest.

# Purpose

This is a relaxing experiment. In this experiment, we need to complete some requirements according to the experimental instructions, which are about implementing a random forest.

# Procedure and Results

We use make_blobs in sklearn as our dataset, which is a supervised dataset for multi-classifying. The decision tree is also based on sklearn.tree .

The dataset distribution is like this:

# Creating a decision tree

With a row decision tree, we get the classification result like this:

# Decision trees and overfitting

The result is not very well. At the center of the figure, we can see that the decision boundary is not smooth, which means it may be overfitting. In fact, we can control the result by adjusting the depth of the decision tree. The following two figures show the case where the depth of the decision tree is 1 and 5 respectively.

# Ensembles of estimators: random forests

Although we can avoid overfitting by depth control, the loss of features cannot be ignored. In that case, we will use a ensemble learning method, random forests.

sklearn give us a convenient API for bagging classifier. Here is the result.

It's easy to see that RF is much better than a single decision tree. It can generate a robust decision boundary without overfitting.

# Thoughts

The crux of ensemble learning is that how to combine much weak classifiers. But this experiment doesn't care that. I think the key of ensemble learning is the proper feature selection and the decision to aggregate different predictions.

Another important thing I think is that, the idea of ensemble learning can be generalized to many classifier, not only the decision tree.