RA ch1 Linear Regression with One Predictor Variable

Ch 1: Linear Regression with One Predictor Variable

Outline

  1. Simple Regression Model

    • Formulation by Least Square Estimation
    • Gauss-Markov theorem
    • Properties of linear regression model
    • Estimation of σ2=Var(E)
  2. Normal Error Regression Model

    • Formulation by Maximum Likelihood Estimation
    • Advantages of Normal error regression model

1. Simple Regression Model

Formulation by Least Square Estimation

Let Y be dependent / response variable, and X be independent / explanatory / predictor variable. Given n paired data (Xi,Yi),i=1,,n, we want to find a linear function f(x)=β0+β1x s.t.

Yi=β0+β1Xi+ϵi,i=1,,n

where ϵi stands for the error between the function value f(Xi) and the response Yi, E(ϵi)=0, Var(ϵi)=σ2, ϵiϵj for ij.

There are various ways to draw a line which crosses the data points (Xi,Yi),i=1,,n. Among them, it is reasonable to select one with the least sum of squared errors iϵi2.

Define a function Q parametrized by β0,β1,

Q=Q(β0,β1)=i(Yiβ0β1Xi)2=iϵi2

Note

  • Q is convex.
  • Every convex function has a global minimum.
  • Q is differentiable over β0,β1.

By differentiating the function Q over β0,β1, find a critical point (b0,b1) by setting each partial derivative = 0.

Qβ0=2i(Yiβ0β1Xi)=0 Qβ1=2iXi(Yiβ0β1Xi)=0

The solution for the above simultaneous equations is

b0=E(Y)b1E(X) b1=E[(XiE(X))(YiE(Y))]i(XiE(X))2=Cov(X,Y)Var(X)

Proof

  1. nβ0=i(Yiβ1Xi)
  2. iXi(Yiβ0β1Xi)=i(XiE(X))(Yiβ0β1Xi) =i(XiE(X))(YiE(Y)+β1E(X)β1Xi)

Remark

  1. Once a dataset (Xi,Yi),i=1,,n is given, Xis are known constants. Also, E(Yi)=β0+β1Xis are considered as constants.
  2. The error term ϵi is a random variable.
  3. Yi is the sum of the constant β0+β1Xi and random ϵi. Therefore, Yi is also a random variable.
  4. Since E(ϵi)=0, </br>E(Yi)=E(β0+β1Xi+ϵi)=β0+β1Xi, and </br>Var(Yi)=Var(ϵi)=σ2.

Gauss Markov theorem

Statement Under the regression model, the least square estimators of regression coefficients b0,b1 are

  1. unbiased (i.e. E(b0)=β0,E(b1)=β1) and
  2. having minimum variance among all unbiased linear estimators. (i.e. Var(bi)Var(bi),i=0,1, bi unbiased, linear)

Shortly, it is said that “b0,b1 are BLUE(best linear unbiased estimator) of β0,β1, respectively.

Proof go to link

Properties of linear regression model

Notations Observation : Y=β0+β1X+ϵ Real line : E(Y)=β0+β1X (unknown) Fitted line : Y^=b0+b1X (known)

residual : ei=YY^ (known) error : ϵi=YE(Y) (unknown)

  1. iei=0

    i(YiYi^)=i(Yib0b1Xi)=0 by the choice of b0,b1

  2. iei2 is the minimum
  3. iYi=iYi^
  4. iXiei=0 (residual e is orthogonal to X)

    iXiei=i(XiX¯)ei=i(XiX¯)(Yi(Y¯+b1(XiX¯))) > =i(XiX¯)(YiY¯)b1i(XiX¯)2 > =i(XiX¯)(YiY¯)i(XiX¯)(YiY¯)i(XiX¯)2i(XiX¯)2=0

  5. iYi^ei=0 (residual e is orthogonal to the fitted line Y^)
  6. Yi^=b0+b1Xi (regression line passes through (X¯,Y¯))
    Yi^=Y¯+b1(XiX¯)

Estimation of σ2=Var(E)

ei=YiYi^ Define SSE (Sum of Squared Error) =i(YiYi^)2=iei2 s2=MSE (Mean Square Error) =SSEn2=i(YiYi^)2n2=iei2n2 (n2 is the degree of freedom, df of the model)

Observation

  • E(MSE)=σ2 (s2 is an unbiased estimator of σ2)
  • Var(ϵi)=iϵi2n

2. Normal Error Regression Model

Formulation by Maximum Likelihood Estimation

Assuming a normal distribution to the error terms ϵi, we can estimate Var(ϵi)=σ2 as well as β0,β1.

Now the formulation is given as Yi=β0+β1Xi+ϵi where ϵiN(0,σ2),ϵiϵj for ij, (pdf of ϵi) =12πσexp(ϵi22σ2)

We estimate β0,β1 and σ2 from Maximum Likelihood Estimation. Since E(Yi)=β0+β1Xi and Var(Yi)=σ2, pdf of Yi is given as fi=12πσexp((Yiβ0β1Xi)22σ2).

Using the i.i.d. condition of variables Yi and Yj, the likelihood function, L is given as

L(β0,β1,σ2)=Πifi=1(2πσ)nexp(i(Yiβ0β1Xi)22σ2)

We maximize L or logL by calcuating the critical point. Then the estimators obtained by maximizing likelihood are given as

β0^=b0=Y¯b1X¯ β1^=b1=i(XiX¯)(YiY¯)i(XiX¯)2

(same as LSE)

σ2^=i(Yiβ0^β1^Xi)2n=i(YiYi^)2n

cf) MSE=s2=i(YiYi^)2n2 is an unbiased estimator of σ2. σ2^ is not an unbiased estimator of σ2.

Advantages of Normal error regression model

By introducing normal distribution on the error ϵi, we could obtain an estimator of σ2. Also, in the following chapter, we could find out this normality condition enables to calculate confidence intervals to several variables in the linear regression model.


Here is the jupyter notebook script to run several practice codes using R.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Wasserstein Proximals Stabilize Training of Generative Models and Learn Manifolds
  • Lipschitz-Regularized Gradient Flows and Latent Generative Particles
  • Lipschitz Regularized Gradient Flows and Latent Generative Particles
  • Sample generation from unknown distributions - Particle Descent Algorithm induced by (f, Γ)-gradient flow
  • Python ODE solving tutorial