RA ch6 Multiple Regression 1
Ch 6: Multiple Regression 1
Outline
- General regression models
- types of general regression models
- simple regression model
- normal error regression model
- Fitted values and residuals
- ANOVA approach
- ANOVA results
- F-test
- coefficient of multiple determination
- Inference
- inference on parameters
- mean response of \(Y_h\)
- a new observation \(Y_{h,new}\)
1. General regression models
types of general regression models
- \(p-1\) predictor variables
ex) \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \epsilon_i\) - qualitative predictor variables \(\rightarrow\) Ch 8
ex) Male/Female - polynomial regression
ex) \(Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \epsilon_i\) - transformed variables
ex) \(Y_i^\ast = log Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\) - interaction effects
ex) \(Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 X_{1i}X_{2i} + \epsilon_i\)
From now on, we increase the number of predictor variables to \(p-1\). To rewrite linear regression model with \(p-1\) predictor variables into matrix form, we need to define several terms such as:
- \(Y = (Y_1, Y_2, \cdots, Y_n)^t\), size = \(n \times 1\)
- \(X = \newline(1, X_{11}, X_{12}, \cdots, X_{1,p-1} ; 1, X_{21}, X_{22}, \cdots, X_{2,p-1} ; \cdots ; 1, X_{n1}, X_{n2}, \cdots, X_{n,p-1})\), size = \(n \times p\)
- \(\beta = (\beta_0, \beta_1, \beta_2, \cdots, \beta_{p-1})^t\), size = \(p \times 1\)
- \(\epsilon = (\epsilon_1, \epsilon_2, \cdots, \epsilon_n)^t\), size = \(n \times 1\)
simple regression model
\(Y = X\beta + \epsilon\), \(E(\epsilon) = 0\), \(\sigma^2(\epsilon) = \sigma^2 I \in \mathbb{R}^n\) LSE: minimizer of \(Q = (Y-X\beta)^t(Y-X\beta) = \epsilon^t \epsilon\) \(b = (X^tX)^{-1}X^tY\)
normal error regression model
Add an assumption that \(Y \sim N(X\beta, \sigma^2 I)\). MLE: maximizer of \(L(\beta, \sigma^2) = \frac{1}{(2\pi \sigma^2)^{\frac{n}{2}}} exp \left(-\frac{1}{2\sigma^2} (Y-X\beta)^t(Y-X\beta)\right)\) \(\hat{\beta} = (X^tX)^{-1}X^tY\)
Fitted values and residuals
\(\hat{Y} = (\hat{Y_1}, \hat{Y_2}, \cdots, \hat{Y_n})^t = Xb = X(X^tX)^{-1}X^tY = HY\), where \(H = X(X^tX)^{-1}X^t\) (Hat matrix).
- \[E(\hat{Y}) = E(Y)\]
- \(\sigma^2(\hat{Y}) = \sigma^2 H\)
\(e = Y-\hat{Y} = Y - Xb = Y - HY = (I-H)Y\) - \[E(e) = 0\]
- \[\sigma^2(e) = \sigma^2(I-H)\]
2. ANOVA approach
ANOVA results
name | formula | df | mean square |
---|---|---|---|
\(SSTO\) | \(Y^tY - \frac{1}{n}Y^tJY\) | \(n-1\) | |
\(SSE\) | \(Y^tY - Y^tHY\) | \(n-p\) | \(MSE = \frac{SSE}{n-p}\) |
\(SSR\) | \(Y^tHY - \frac{1}{n}Y^tJY\) | \(p-1\) | \(MSR = \frac{SSR}{p-1}\) |
\(E(MSE) = \sigma^2\) \(E(MSR) = \sigma^2 + \alpha\), \(\alpha \geq 0\)
F-test
\(H_0 : \beta_1 = \beta_2 = \cdots = \beta_{p-1} = 0\) test statistics: \(F^\ast = \frac{MSR}{MSE} \sim F(p-1, n-p)\) under \(H_0\) by Cochran’s theorem
coefficient of multiple determination
\(R^2 = \frac{SSR}{SSTO} = 1- \frac{SSE}{SSTO}\), \(0 \leq R \leq 1\) If it is near 1, there is a strong linear relationship between \(X\) and \(Y\). If it is near 0, all \(b_k=0, \forall k=1,2,\cdots, p-1\)
- Adjusted \(R^2\)
- \(R_a^2 = 1- \frac{SSE/(n-p)}{SSTO/(n-1)}\)
- to avoid deciding the most complex model with largest number of variables
3. Inference
inference on parameters
\(E(b) = \beta\) (unbiased) \(\sigma^2(b) = \sigma^2 (X^tX)^{-1}\), size: \(p \times p\) Needs an assumption that \(b \sim MVN(\beta, \sigma^2 (X^tX)^{-1})\) \(MVN\): multi-variate normal distribution
\(s^2(b) = MSE(X^tX)^{-1}\) For the normal error regression model, \(\frac{b_k -\beta_k}{s(b_k)} \sim t(n-p), k=0,1, \cdots, p-1\)
\(1-\alpha\) confidential limits for single \(\beta_k\) \(b_k \pm t(1-\alpha/2, n-p)s(b_k)\)
T-test \(H_0 : \beta_k = 0\), test statistic: \(t^\ast = \frac{b_k}{s(b_k)}\)
mean response of \(Y_h\)
\(X_h = (1, X_{h1}, X_{h2}, \cdots, X_{hn})^t\) \(\hat{Y_h} = X_h^tb\) \(\sigma^2(\hat{Y_h}) = \sigma^2(X_hb) = X_h^t \sigma^2(X^tX)^{-1} X_h = X_h^t \sigma^2(b) X_h\) \(s^2(\hat{Y_h}) = s^2(X_hb) = MSE X_h^t (X^tX)^{-1} X_h = X_h^t s^2(b) X_h\)
\(1-\alpha\) confidential limits for \(E(Y_h)\) \(\hat{Y_h} \pm t(1-\alpha/2, n-p)s(\hat{Y_h})\)
a new observation \(Y_{h,new}\)
\[X_h = (1, X_{h1}, X_{h2}, \cdots, X_{hn})^t\]\(1-\alpha\) confidential limits for \(Y_{h,new}\) \(\hat{Y_h} \pm t(1-\alpha/2, n-p)s(pred)\) \(s^2(pred) = MSE + s^2(\hat{Y_{h,new}}) = MSE (1 + X_h^t (X^tX)^{-1} X_h)\)
We’ve generalized linear regression model with one predictor variable to multi-variable. Many things were analogous to one variable case, but we could notice that tests for the coefficients were restricted. Next time, we introduce extra sum of squares to ease this restriction.
Here is the jupyter notebook script to run several practice codes using R.
Enjoy Reading This Article?
Here are some more articles you might like to read next: