PAC Learning

In the last chapter, we have shown that for a finite hypothesis class $\mathcal{H}$ , as long as we have large enough sample ( $m \ge log(| \mathcal{H}|/\delta)/\epsilon$ ), then applying ERM will output a hypothesis that is probably (at least $1-\delta$ ) approximately correct (general loss $\le \epsilon$ ). More generally, we define Probably Approximately Correct (PAC) learning as:

THEOREM 3.1 (PAC learnability) A hypothesis class $\mathcal{H}$ is PAC learnable if there exists a function $m_{ \mathcal{H} }: (0,1)^2 \rightarrow \mathbb{N}$ and a learning algorithm with the following property: For every $\epsilon, \delta \in (0,1)$ , for every distribution $\mathcal{D}$ over $\chi$ , and for every labeling function $f: \chi \rightarrow \{0,1\}$ , if the realizable assumption holds with respect to $\mathcal{H}, \mathcal{D}, f$ , then running the algorithm on $m \ge m_{ \mathcal{H} }(\epsilon, \delta)$ i.i.d examples generated by $\mathcal{D}$ and labeled by $f$ will get a hypothesis $h$ such that, with probability of at least $1-\delta$ (over the choice of all possible $m$ -tuple sample), $L_{( \mathcal{D}, f )}(h)\le \epsilon$ (general loss bounded).

In short, if the hypothesis class you choose is PAC learnable, then increasing sample size guarantee you to have reliable bounded general loss.
Quantitatively, the sample complexity (number of sample size needed) is determined by $m_{ \mathcal{H} }$ controlled by two parameters, $\epsilon$ and $\delta$ .
The accuracy parameter $\epsilon$ measures how far the output classifier can be from the optimal one (this corresponds to the "approximately correct").
The confidence parameter $\delta$ indicates how likely the classifier is to meet that accuracy requirement (corresponds to the "probably" part of “PAC”).
The use of $\delta$ is inevitable since for finite sample, there is always a small chance that the sample data is non-representative for the underlying distribution $\mathcal{D}$ (e.g., the sample contains only one data point).
In addition to $\epsilon$ and $\delta$ , $m_{ \mathcal{H} }$ also depends on the property of $\mathcal{H}$
All finite hypothesis classes are PAC learnable and $m_{ \mathcal{H} }(\epsilon, \delta) \le log(| \mathcal{H}|/\delta)/\epsilon)$
There are also PAC learnable infinite hypothesis classes and the PAC learnability is not determined by fitness.

PAC Learning

PAC Learning

results matching ""

No results matching ""