1. 1 Error
There are two types of error:
- true error: $\operatorname { error } { \mathcal { D } } ( h ) \equiv \operatorname { Pr } { x \in \mathcal { D } } [ f ( x ) \neq h ( x ) ]$
- $D$ for distribution
- sample error: $error_s( h ) \equiv \frac { 1 } { n } \sum_ { x \in S } \delta ( f ( x ) \neq h ( x ) )$
- $\delta ( f ( x ) \neq h ( x ) )=1$ if $f ( x ) \neq h ( x )$
How well dose sample error estimate true error?
We can check
- Bias
- Variance
2. 2 Estimators
- Choose sample $S$ of size $n$ according to $D$
- measure $error_s(h)$
- $\to$ sample error is an unbiased estimator for true error
e.g. with approximately $95%$ probability, true error lie in
$$
\operatorname { error } { S } ( h ) \pm 1.96\sqrt \frac { \text { errors } ( h ) \left( 1 - e r r o r { S } ( h ) \right) } { n }
$$