The Bias-Complexity Tradeoff
Chapter 2, 3, and 4 mainly discussed how to avoid overfitting by restricting the search space to some hypothesis space . Such a restriction is based on our prior knowledge. For example, in our papayas taste problem, on the basis of our previous experience with other fruits, we may assume that some rectangle in the color-hardness plane predicts (at least approximately) the papaya’s tastiness.
While restricting hypothesis space ease the problem of overfitting, it is possible that in , there will be no hypothesis achieving small enough training or testing error, i.e. our prior knowledge leads us to under-fitting. Hence, a coming up question is that can we design a universal learner that has no prior knowledge about a certain task (and hence no risk of under-fitting) and is ready to be challenged by any task (without encountering overfitting problem). The No-Free-Lunch theorem states that no such universal learner exists. Every learner has tasks on which it fails (overfits training data) while some other learners succeed on those tasks.
What we learn from the No-Free-Lunch theorem is that when approaching a particular learning problem, defined by some distribution , we should have some prior knowledge on
. One type of such prior knowledge is that
comes from some specific parameteric family of distributions, as will discussed in chapter 24. Another type of prior knowledge, which we assumed when defining the PAC learning model, is that there exists
in some predefined hypothesis class
such that
(or
is small for agnostic PAC learning model). In a sense, this assumption (prior knowledge) on
is the prequisite for using the (agnostic) PAC model.
In the second part of this chapter, we study the benefits and pitfalls of using a hypothesis class as a means of formalizing prior knowledge. This part formally discuss the bias-complexity tradeoff — restricting hypothesis space ease the problem of overfitting but might leas to under-fitting.