The Statistical Learning Framework

Let’s first describe a formal model capturing statistical learning tasks.

The Learner’s Input

  • Domain set : A set of objects that we wish to label. Usually, this domain points are represented by a vector of features (e.g., several papayas represented by their color and softness).

  • Label set : For current discussion, is restricted to a two-element set, or (e.g., whether the papaya is tasty or not).

  • Training data : A finite sequence of pairs in ; that is, a sequence of domain points and their labels.

How the Training Data is Generated

  • We assume that each instance is sampled according to a probability distribution

  • For current discussion, assume that there is a "correct" labeling function such that for all .

The Learner’s Output

  • The learner does not know anything about and is required to learn

  • The learner should output a prediction rule (function) . The function is also called predictor, a hypothesis, or a classifier.

Measures of Success

  • The error of a classifier is defined as the probability that it does not predict the correct label of a randomly sampled (according to ) data point.

  • Formally, the error of the classifier is defined as .

  • The notation assigns a probability determining how likely it is to observe a point , where is an event.

  • In other words, with respect to the distribution and the correct labeling function , the error of such is the probability of a randomly sampled example belonging to the set

  • is also called the generalization error, the risk, the true error of

  • The letter is used since we view the error as a loss of the learner.

results matching ""

    No results matching ""