PAC Learning
In the last chapter, we have shown that for a finite hypothesis class , as long as we have large enough sample (
), then applying ERM will output a hypothesis that is probably (at least
) approximately correct (general loss
). More generally, we define Probably Approximately Correct (PAC) learning as:
THEOREM 3.1 (PAC learnability) A hypothesis class is PAC learnable if there exists a function
and a learning algorithm with the following property: For every
, for every distribution
over
, and for every labeling function
, if the realizable assumption holds with respect to
, then running the algorithm on
i.i.d examples generated by
and labeled by
will get a hypothesis
such that, with probability of at least
(over the choice of all possible
-tuple sample),
(general loss bounded).
-
In short, if the hypothesis class you choose is PAC learnable, then increasing sample size guarantee you to have reliable bounded general loss.
-
Quantitatively, the sample complexity (number of sample size needed) is determined by
controlled by two parameters,
and
.
-
The accuracy parameter
measures how far the output classifier can be from the optimal one (this corresponds to the "approximately correct").
-
The confidence parameter
indicates how likely the classifier is to meet that accuracy requirement (corresponds to the "probably" part of “PAC”).
-
The use of
is inevitable since for finite sample, there is always a small chance that the sample data is non-representative for the underlying distribution
(e.g., the sample contains only one data point).
-
In addition to
and
,
also depends on the property of
-
All finite hypothesis classes are PAC learnable and
-
There are also PAC learnable infinite hypothesis classes and the PAC learnability is not determined by fitness.