Finite Classes Are Agnostic PAC Learnable
With Corollary 4.4, we could prove that every finite hypothesis class is agnostic PAC learnable if it holds uniform convergence property.
The proof is similar to how we prove that every finite hypothesis is PAC learnable:
-
We write down the probability of "violation" of uniform convergence and set an inequality equation to upper bound on it
-
By solving the inequality equation, we derive the number of samples required to ensure uniform convergence at certain confidence level
Derivation of The Inequality
-
By definition, if uniform convergence holds for
, then
-
Equivalently, we want to bound the probability that uniform convergence does not hold by
:
-
Writing
-
Applying union bound, we obtain:
-
We will then argue that
as long asis larger than a amount (determined by
and
).
Before the derivation, let’s recall and compare what we did when proving all finite hypothesis class are PAC learnable:
-
The probability of overfitting is written as
.
-
We also bound the overfitting probability by
:
-
We then argue that:
, where
-
Applying union bound, we obtain:
-
Note that we considered a stricter set when deriving the bound of PAC learnability:
,
-
However, we should remember that PAC learnability was derived when the realizability assumption holds.
Derivation of the Result
We now show that the probability of "violation" of uniform convergence is small when the sample size is large enough.
-
The upper bound of the probability:
-
Recall that
and
-
By the linearity of expectation, it follows that
is the expectation of
-
Hence,
is the deviation of the random variable
from its expectation.
-
Since
is the empirical average over
i.i.d random variables, by the law of large numbers, when
goes to infinity,
will converge to its expectation.
-
Hence, asymptoticly, the probability of violation of uniform convergence is small when
approaches infinity. However, this does not quantify how large
should be given
and
-
To describe the relation between
,
, and
, we adopt Hoeffding’s Inequality to get a simpler upper bound.
-
LEMMA 4.5 (Hoeffding’s inequality): Let
be a sequence of i.i.d. random variables and assume that for all
,
and
. Then, for any
-
Assume that the range of
is
and write
, and
, we have:
-
And the total upper bound is:
-
Solving the equation
, we have:
COROLLARY 4.6
Let be a finite hypothesis class. Let
be a domain and let
be a los function. Then
enjoys the uniform convergence property with sample complexity
Furthermore, is agnostically PAC learnable using the ERM algorithm with sample complexity