Uniform Convergence Is Sufficient for Learnability

Recall that given a hypothesis class $\mathcal{H}$ and upon receiving a training sample $S$ , ERM learner selects $h$ in $\mathcal{H}$ that minimizes empirical risk. We hope that $h$ also minimizes the true risk as well. For this, it suffices to ensure that the empirical risk is close to the true risk uniformly over all hypothesis.

Accordingly, we have the following definition:
Definition 4.1 ( $\epsilon$ -representative) A training set $S$ is called $\epsilon$ -representative (w.r.t. domain $Z$ , hypothesis class $\mathcal{H}$ , loss function $\ell$ , and distribution $\mathcal{D}$ ) if:
$\forall h \in \mathcal{H}, |L_S (h)-L_{\mathcal{D}}(h)|\le \epsilon$ .

The following lemma links $\epsilon$ -representative and agnostic PAC learnability:
LEMMA 4.2 For a ( $\epsilon/2$ )-representative training set $S$ (w.r.t. domain $Z$ , hypothesis class $\mathcal{H}$ , loss function $\ell$ , and distribution $\mathcal{D}$ ), any output of $ERM_{\mathcal{H}}(S)$ satisfies:
$L_{\mcathcal{D}}(h_s) \le \mathop{min}_{h \in \mathcal{H}} L_{ \mathcal{D}}(h) + \epsilon$ , where $h_S\in argmin_{h\in\mathcal{H}} L_S(h)$

Proof:
$L_{\mathcal{D}}(h_S) \le L_S(h_S)+\frac{\epsilon}{2}\le L_S(h)+\frac{\epsilon}{2}\le L_{\mathcal{D}}(h)+\frac{\epsilon}{2}+\frac{\epsilon}{2}=L_{\mathcal{D}}(h)+\epsilon$ ,

In the below, we formally state that if a hypothesis class has uniform convergence property, then it is agnostic learnable.

Definition 4.3 Uniform Convergence A hypothesis class $\mathcal{H}$ has the uniform convergence property (w.r.t. a domain $Z$ and a loss function $\ell$ ) if there exists a function $m_{\mathcal{H}}^{UC}: (0,1)^2 \rightarrow \mathbb{N}$ such that for every $\epsilon, \delta \in (0,1)$ , for every distribution $\mathcal{D}$ over $Z$ , if $S$ is a sample of $m\ge m_{\mathcal{H}}^{UC}(\epsilon,\delta)$ i.i.d examples generated by $\mathcal{D}$ , then with probability of at least $1-\delta$ , $S$ is $\epsilon$ -representative.

The term uniform here refers to having a fixed sample size that works for all members of $\mathcal{H}$ and over all possible probability distributions over the domain.

The following corollary follows directly from Lemma 4.2 and the definition of uniform convergence:

COROLLARY 4.4 If a hypothesis class has the uniform convergence property with a function $m_{\mathcal{H}}^{UC}$ then the class is agnostic PAC learnable with the sample complexity $m_{\mathcal{H}}(\epsilon, \delta) \le m_{\mathcal{H}}^{UC}(\epsilon/2, \delta)$ . Furthermore, $ERM_{\mathcal{H}}$ is a successful agnostic PAC learner for $\mathcal{H}$

Uniform Convergence Is Sufficient for Learnability

Uniform Convergence Is Sufficient for Learnability

results matching ""

No results matching ""