2024-09-11
Introduction
Summary
keywords
TO-DO
Homework
Exercise*
Next time
RECAP
standard errors
Hypothesis testing
Null hypothesis : If you want to prove the relationship between $X$ and $Y$
For a linear model, it corresponds to the slope $\beta_0$ .
If alternate hypothesis is equal to zero, it is a constant function, so it means $X$ and $Y$ is irrelevent.
t-statistic.
$t=2$ is is the irrelevance threshold. Usually, we take . If $t>2$, it is unlikely for two parameter is irrelevant.
p -value.
in $(1-p)$ probability, your hypothesis is true.
RSE, Residual Standard Error
The $n-2$ stands that we are using 2 parameter. It is called the freedom of ...?
TSS, Total sum of squares
$R^2$ Value.
$R^2$ shows how much accurate the model explains the data. It is usually very high if the model explains well. The threshold varies by the data, or accuracy you expect for your model.
Things to think about when choosing predictor var.
Avoid correlation between predictors.
When there's negative slopes in more-than-2 predictor model, it is highly likely that some variables are correlated.
F- value : How do we know if $X_i$ choose was good?
Numerator means : how much variance the predictors reduced. Denominator means : how much variance there is left.
적어도 하나의 변수가 예측하는데 도움이 된다면, F가 크게 나옴.
How do we know Which $X_i$ is relevant?
Forward selection : One by one, see if it results the lowest $RSS$. Then add it. Backward selection : One by one, see if $p$ changes as we remove one variable. the variable with the largest p-value is removed.
Other : Non-quantitative variables
categorial predictors
make it a s a 0-1 variable.
The level with no dummy variable is called baseline. It could be anything out of all choices(If unrelevant, It will just make shifts).
Interaction effect
sometimes called the synergy effect.
We can identify this when the data is generally higher than the model's estimation.
introduce $X_i \cdot X_j$ as another parameter.
Last updated