Finite Classes Are Agnostic PAC Learnable

With Corollary 4.4, we could prove that every finite hypothesis class is agnostic PAC learnable if it holds uniform convergence property.

The proof is similar to how we prove that every finite hypothesis is PAC learnable:

  • We write down the probability of "violation" of uniform convergence and set an inequality equation to upper bound on it

  • By solving the inequality equation, we derive the number of samples required to ensure uniform convergence at certain confidence level

Derivation of The Inequality

  • By definition, if uniform convergence holds for , then

  • Equivalently, we want to bound the probability that uniform convergence does not hold by :

  • Writing

  • Applying union bound, we obtain:

  • We will then argue that

    as long as is larger than a amount (determined by and ).

Before the derivation, let’s recall and compare what we did when proving all finite hypothesis class are PAC learnable:

  • The probability of overfitting is written as .

  • We also bound the overfitting probability by :

  • We then argue that:
    , where

  • Applying union bound, we obtain:

  • Note that we considered a stricter set when deriving the bound of PAC learnability: ,

  • However, we should remember that PAC learnability was derived when the realizability assumption holds.

Derivation of the Result

We now show that the probability of "violation" of uniform convergence is small when the sample size is large enough.

  • The upper bound of the probability:

  • Recall that

    and

  • By the linearity of expectation, it follows that is the expectation of

  • Hence, is the deviation of the random variable from its expectation.

  • Since is the empirical average over i.i.d random variables, by the law of large numbers, when goes to infinity, will converge to its expectation.

  • Hence, asymptoticly, the probability of violation of uniform convergence is small when approaches infinity. However, this does not quantify how large should be given and

  • To describe the relation between , , and , we adopt Hoeffding’s Inequality to get a simpler upper bound.

  • LEMMA 4.5 (Hoeffding’s inequality): Let be a sequence of i.i.d. random variables and assume that for all , and . Then, for any

  • Assume that the range of is and write , and , we have:

  • And the total upper bound is:

  • Solving the equation , we have:

COROLLARY 4.6

Let be a finite hypothesis class. Let be a domain and let be a los function. Then enjoys the uniform convergence property with sample complexity

Furthermore, is agnostically PAC learnable using the ERM algorithm with sample complexity

results matching ""

    No results matching ""