Empirical Risk Minimization
Now, let’s describe a simple learning paradigm for the preceding learning framework.
-
Since the learner does not know both
and
, the true error
is not available to the learner
-
An alternative is the training error, the error the classifier incurs over the training sample, defined as:
, where
-
The training error is also called empericial error or empericial risk
-
Since the training sample
is also sampled according to
, by minimizing the empirical risk
, we hope we also minimize the true error
.
-
This learning paradigm – coming up with a predictor
that minimizes
– is called Empirical Risk Minimization or ERM for short.
Something May Go Wrong – Overfitting
Remember the Pigeon Superstition Experiment in chapter 1? A bunch of hungry pigeons were placed in the cage and are served with food at regular intervals. Each pigeon was engaged in some activity when the food was first delivered. The pigeons then repeated their actions, thinking that their actions led to the food delivery. That, in turn, makes their specific activity coincide with the food delivery more frequently. In the end, the pigeons continue to perform the same actions diligently because they think they found the prediction rule: "doing certain behavior leads to food delivery".
The pigeons are actually performing ERM, which minimizes the training error (though not defined clearly here). They found a predictor whose performance on the training set is excellent, yet its performance in the real world is very poor. This phenomenon is called overfitting. Intuitively, overfitting occurs when our hypothesis fits the training data “too well”. A more formal example is described in section 2.2.1 in the book.