2024-09-09
Last updated
Last updated
Summary
keywords
TO-DO
Homework
Exercise*
Next time
Variance occurs because we make the function with only sampled datasets. Bias occurs because we try to make the function as simple as possible. (generally, flexible model has small bias)
irreducible error is shown as dotted horiziontal line.
how are we going to classsify =/ldata in the greyszone l. miscalculating average clock
KNN stands for K- nearest neighbours. Check k nearest neighbors and see if which sided data is more
How can we decide the good k value?
save some portion of the data for test data.
Use the other portion of data as a training data.
Assume a model we want to find the intercept and slope.
We introduce a residue, the difference of the model-predicted data and the real one.
Lets minimise the sum of squares. RSS(residue sum of squares).
By partial differentiating with $\beta_0$ or $\beta_1$, we can find the point where $RSS$ is smallest.
95% confidence
KNN : K =1 is too flexible ; it is prone to outliers KNN : K = 100 is too rigid ; This has bigger bias??
The sample percentage will affect the choosing of $k$.