Uniform Convergence Is Sufficient for Learnability
Recall that given a hypothesis class and upon receiving a training sample
, ERM learner selects
in
that minimizes empirical risk. We hope that
also minimizes the true risk as well. For this, it suffices to ensure that the empirical risk is close to the true risk uniformly over all hypothesis.
Accordingly, we have the following definition:
Definition 4.1 (-representative) A training set
is called
-representative (w.r.t. domain
, hypothesis class
, loss function
, and distribution
) if:
.
The following lemma links -representative and agnostic PAC learnability:
LEMMA 4.2 For a ()-representative training set
(w.r.t. domain
, hypothesis class
, loss function
, and distribution
), any output of
satisfies:
, where
Proof:
,
In the below, we formally state that if a hypothesis class has uniform convergence property, then it is agnostic learnable.
Definition 4.3 Uniform Convergence A hypothesis class has the uniform convergence property (w.r.t. a domain
and a loss function
) if there exists a function
such that for every
, for every distribution
over
, if
is a sample of
i.i.d examples generated by
, then with probability of at least
,
is
-representative.
The term uniform here refers to having a fixed sample size that works for all members of and over all possible probability distributions over the domain.
The following corollary follows directly from Lemma 4.2 and the definition of uniform convergence:
COROLLARY 4.4 If a hypothesis class has the uniform convergence property with a function then the class is agnostic PAC learnable with the sample complexity
. Furthermore,
is a successful agnostic PAC learner for