RA ch7 Multiple Regression 2

Ch 7: Multiple Regression 2

In Ch 6, we’ve seen linear regression model with multiple variables. To resolve the limitation for testing regression coefficients, we introduce a new concept, extra sum of squares.

Outline

  1. Extra sum of squares
    • definition
    • application 1 : Tests
    • application 2 : coefficient of partial determination
  2. Standardized multiple regression model
    • correlation transformation and standardized regression model
    • multicollinearity issue

1. Extra sum of squares

definition

Recall

  • SSTO can be divided into SSE and SSR.
  • When we add prediction variables to the regression model, SSE gets decreased, while SSR gets increased.

Using the above concept, define extra sum of squares

SSR(X2X1)=SSE(X1)SSE(X1,X2),

the marginal explanation ability by adding X2 into the model containing X1. SSR(X2X1)=SSR(X1,X2)SSR(X1) results in the same value..

application : Tests

Test whether a single βk=0

H0:β3=0

Full model) Yi=β0+β1Xi1+β2Xi2+β3Xi3+ϵi

Reduced model) Yi=β0+β1Xi1+β2Xi2+ϵi

test statistic: (SSE(R)SSE(F))/(dffdfr)SSE(F)/dff
=(SSE(X1,X2)SSE(X1,X2,X3))/(n3(n4))SSE(X1,X2,X3)/(n4)=SSR(X3|X1,X2)/1SSE(X1,X2,X3)/(n4)=MSR(X3|X1,X2)MSE(X1,X2,X3)F(1,n4)

Test whether several βk=0

H0:β2=β3=0

Full model) Yi=β0+β1Xi1+β2Xi2+β3Xi3+ϵi

Reduced model) Yi=β0+β1Xi1+ϵi

test statistic: (SSE(X1)SSE(X1,X2,X3))/(n2(n4))SSE(X1,X2,X3)/(n4)
=SSR(X2,X3|X1)/2SSE(X1,X2,X3)/(n4)=MSR(X2,X3|X1)MSE(X1,X2,X3)F(2,n4)

Test whether the slopes of xk and xl are the same

H0:β2=β3=βc

Full model) Yi=β0+β1Xi1+β2Xi2+β3Xi3+ϵi

Reduced model) Yi=β0+βc(Xi1+Xi2)+β3Xi3+ϵi Consider Xi1+Xi2 as a new variable.

test statistic: SSR(X1+X2|X3)/1SSE(X1,X2,X3)/(n4)=MSR(X1+X2|X3)MSE(X1,X2,X3)F(1,n4)

application 2 : coefficient of partial determination

We can also calculate R2 which accounts for marginal contribution of X2 given that X1 is in the model. R2|12=SSR(X2|/X1)SSE(X1) 0R2|121 and large value implies large contribution.

2. Standardized multiple regression model

When the predictor variables are not scaled properly, it leads to det(XtX)0, which means there is multicollinearity between predictor variables. There is a cure for this problem.

correlation transformation and standardized regression model

correlation transform n : number of observations We use standardized variables YY¯sY and XikXk¯sXk.

When we use Yi and Xik as simple functions of YY¯sY and XikXk¯sXk, respectively, Yi=1n1YiY¯sY=YiY¯i(YiY¯)2 where sY2=i(YiY¯)2n1 Yik=1n1XikXk¯sXk=XikXk¯i(XikXk¯)2 where sXk2=i(XikXk¯)2n1

standardized regression model Yi=β1Xi1++βp1Xi,p1+ϵi

Note

  • β0=0
  • β0=Y¯β1X1¯βp1Xp1¯
  • βk=sYsXkβk,k=1,2,,p1

Property of correlation matrix of X variables Let corr(Xi,Xj)=rij.

(X)tX=rXX rXXb=rYX b=rYXrXX

multicollinearity issue

Why do we need to avoid multicollinearity among variables?

When we solve normal equation XtXb=XtY, the det(XtX)0 condition makes the system highly singular. This leads to high round-off error while computation, resulting in severe error in b.

How to know if the predictor variables are correlated among themselves?

  1. uncorrelated X1 and X2 > r122=0 > $$SSR(X_1 X_2) = SSR(X_1)orSSR(X_2 X_1) = SSR(X_2)$$
  2. perfectly correlated X1 and X2 > r122=1 > (XtX)1 Does not exist. infinitely many regression lines
  3. general effects of multicollinearity (0<r122<1) r1221 increase of explanation ability is not significant.



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Wasserstein Proximals Stabilize Training of Generative Models and Learn Manifolds
  • Lipschitz-Regularized Gradient Flows and Latent Generative Particles
  • Lipschitz Regularized Gradient Flows and Latent Generative Particles
  • Sample generation from unknown distributions - Particle Descent Algorithm induced by (f, Γ)-gradient flow
  • Python ODE solving tutorial