CH 2: Review of Mathematical Statistics

Outline

Probabilistic distributions
- special distributions
- normal distribution
- T distribution, F distribution
Statistical inference
- statistical estimation
- statistical hypothesis test
- inference for population mean
- inference for population variance
Correlation analysis
- correlation
- statistical inference on \(\rho\)

1. Probabilistic distributions

special distributions

Discrete	\(P(X = k)\)	Continuous	\(P(a \leq X \leq b)\)
binomial	\(B(n, p)\)	gamma	\(\Gamma(\alpha, \beta)\)
poisson	\(\mathcal{P}(\lambda)\)	normal	\(N(\mu, \sigma^2)\)
negative binomial	\(Nb(\alpha, p)\)	Exponential	\(Exp(\lambda)\)
geometric	\(G(p)\)	chi-square	\(\chi^2(n)\)
		beta	\(Beta(\alpha, \beta)\)
		T	\(t(n)\)
		F	\(F(m, n)\)

Note 각 pdf를 암기하는 것보다 각 분포가 이산/연속인지, 매개변수가 무엇이며 어떤 의미인지, 각 분포의 성질을 아는 것이 중요

normal distribution

normal 분포의 경우 pdf를 기억해둘 필요가 있다.

standard normal distribution (표준정규분포) - \(X \sim N(0, 1)\) pdf : \(f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}, x \in \mathbb{R}\) where 0 is the mean and 1 is the variance of the distribution.

General normal distribution - \(Y \sim N(\mu, \sigma^2)\) \(Y = \mu + \sigma X, \sigma>0\) pdf : \(g(y) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(y-\mu)^2}{2\sigma^2}}, y \in \mathbb{R}\) where \(\mu\) is the mean and \(\sigma^2\) is the variance of the distribution.

properties of normal distribution

If \(X_i \sim N(\mu_i, \sigma_i^2)\) independent for \(i=1,2,\cdots, n\), then (a) \(Y = \sum_{i=1}^n a_i X_i \sim N(\sum_i a_i \mu_i, \sum_i a_i^2 \sigma_i^2)\) </br>cf) \(X \sim N(\mu, \sigma^2) \Rightarrow aX + b \sim N(a\mu + b, a^2\sigma^2)\) for constant \(a, b\) (b) \(V = \sum_{i=1}^n \left( \frac{X_i - \mu_i}{\sigma_i} \right)^2 = \sum_{i=1}^n z_i^2 \sim \chi^2(n)\)

T distribution, F distribution

T distribution Let \(Z \sim N(0, 1), V \sim \chi^2(n)\) are independent Define \(T = \frac{Z}{\sqrt{V/n}} \sim t(n)\)

F distribution Let \(U \sim \chi^2(m), V \sim \chi^2(n)\) are independent Define \(W = \frac{U/m}{V/n} \sim F(m, n)\)

properties

\[T^2 = \frac{z^1/1}{V/n} \sim F(1, n)\]
\[W \sim F(n, m) \Rightarrow 1/W \sim F(m,n)\]

Theorem (Student) \(X_1, \cdots, X_n \sim N(\mu, \sigma^2)\) i.i.d.

Define sample mean \(\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\), and sample variance \(s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2\).

(a) \(\bar{X} \sim N(\mu, \frac{\sigma^2}{n})\)

(b) \((n-1) s^2/\sigma^2 = \sum_{i=1}^n \frac{(X_i - \bar{X})^2}{\sigma^2} \sim \chi^2(n-1)\)

(d) \(T = \frac{\bar{X} - \mu}{s/\sqrt{n}} \sim t(n-1)\)

proof (b) \(\sum_{i=1}^n \frac{(X_i - \mu)^2}{\sigma^2} = \sum_{i=1}^n \frac{(X_i - \bar{X})^2}{\sigma^2} + \sum_{i=1}^n \frac{(\bar{X} - \mu)^2}{\sigma^2}\) > \(\sum_{i=1}^n \frac{(\bar{X} - \mu)^2}{\sigma^2} = \frac{(\bar{X} - \mu)^2}{(\sigma/\sqrt{n})^2}\) First term follows \(\chi^2(n)\), and third term follows \(\chi^2(1)\). Due to the additivity in chi-square distribution, (b) follows.

(d) \(T = \frac{\bar{X}-\mu}{s/\sqrt{n}} = \frac{(\bar{X}-\mu)/(\sigma/\sqrt{n})}{\sqrt{s^2/\sigma^2}} = \frac{z}{\sqrt{U/(n-1)}} \sim t(n-1)\)

Note \(n\)이 커지면 T 분포는 표준정규분포와 유사하게 됨. 일반적으로 T 분포는 표준정규분포보다 두꺼운 꼬리를 가짐.

2. Statistical inference

\(X\)가 어떤 분포를 따른다고 하자. 이때 해당 분포의 상방 백분위수 (\(100*(1- \alpha)\) 백분위수)를 \(x_\alpha = P(X \leq x) = \alpha\) 를 만족하는 \(x\)값이라고 정의하자.

statistical estimation(통계적 추정)

모수에 대한 추측값 (점추정, point estimation) 또는 오차의 한계(구간추정, interval estimation)를 제시한다.

ex) (point estimation) \(E(\bar{X}) = \mu, Var(\bar{X}) = \frac{\sigma^2}{n}, E(s^2) = \sigma^2\) ex) (interval estimation) \(X_i \sim N(\mu, \sigma^2)\)이고 모분산 \(\sigma^2\)을 알 때 평균 \(\mu\)의 구간추정 \(z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1)\) \(P(-z_{\alpha/2} \leq \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \leq z_{\alpha/2}) = 1-\alpha\) 따라서, \(\mu\)의 \(100*(1- \alpha)\) 신뢰구간 : \((\bar{X}-z_{\alpha/2} * \sigma/\sqrt{n}, \bar{X}+z_{\alpha/2} * \sigma/\sqrt{n})\) 오차의 한계 : \(z_{\alpha/2} * \sigma/\sqrt{n}\) 신뢰수준 : \(100*(1- \alpha)\)

statistical hypothesis test

가설, hypothesis

귀무가설, Null hypothesis, \(H_0\) : 효과가 없다, 차이가 없다, 서로 다르지 않다는 주장
대립가설, Alternative hypothesis, \(H_1\) : 효과가 있다, 차이가 있다, 서로 다르다는 주장
- 양측가설, 2-sided hypothesis : \(\theta \neq \theta_0\)
- 단측가설, 1-sided hypothesis : \(\theta > \theta_0\) or \(\theta < \theta_0\)

가설 검정, test of statistical hypothesis 귀무/대립가설을 설정하고, 얻어진 자료를 근거로 어느 가설이 더 타당한지 판단

유의성 검정, test of statistical significance 얻어진 자료보다 더 극단적인 자료가 얻어질 가능성을 계산하여, 이를 근거로 주어진 가설의 유효성 판단

유의확률, p-value \(H_0\)이 참이라는 가정 하에, 현재와 같거나 더 극단적인 자료가 얻어질 확률 i.e. 귀무가설을 기각할 경우 저지르게 될 오류의 최대 확률 p-value가 크다 \(\Longleftrightarrow\) 귀무가설이 참이라는 가정 하에서 귀무가설을 기각하는 오류를 낼 확률이 높다 \(\Longleftrightarrow\) 귀무가설이 참일 확률이 높다

유의수준, \(\alpha\) 귀무가설의 기각에 대한 기준 확률 (p-value가 큰지 작은지를 판단하는 기준) p-value \(> \alpha \Rightarrow\) 귀무가설을 채택 p-value \(< \alpha \Rightarrow\) 귀무가설을 기각

기각역 귀무가설을 기각시키는 검정통계량의 관측값의 영역 \(P(\bar{x} \leq x) | \mu = \mu_0) \leq \alpha\)인 \(x\) 값

통계적 유의성 유의확률 (p-value)이 유의수준(\(\alpha\)) 이하이면 검사결과가 유의수준 \(\alpha\)에서 통계적으로 유의하다고 할 수 있다

유의성 검정의 단계

가설 설정 ex) \(H_0 : \mu = \mu_0\) \(H_1 : \mu > \mu\)
유의수준 지정 ex) \(\alpha = 0.05\)
검정통계량 결정 ex) \(z = \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}} \sim N(0, 1)\) under \(H_0\)
p-value 계산 ex) \(z_0 = \frac{\bar{x}-\mu_0}{\sigma/ \sqrt{n}} \Rightarrow p = P(z > z_0 \mid H_0)\)
유의수준과 비교 ex) \(p < \alpha\)
유의성 판단 ex) \(H_0\) 기각

의미: 귀무가설의 반증은 모수의 추정값이 귀무가설 하에서 대립가설 방향으로 멀리 떨어질수록 강함

inference for population mean

\(X_i, i=1, \cdots, n\) are \(n\) observations from a population.

Assuming that \(X_i \sim N(\mu, \sigma^2)\)
1. when \(\sigma^2\) is known : \(z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1)\)
2. when \(\sigma^2\) is unknown : \(t = \frac{\bar{X}-\mu}{s/\sqrt{n}} \sim t(n-1)\)
Assuming that \(X_i \sim i.i.d.(\mu, \sigma^2)\)
1. if \(n\) is large enough
  - when \(\sigma^2\) is known : \(z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1)\)
  - when \(\sigma^2\) is unknown : \(s^2 \rightarrow \sigma^2\), \(z = \frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1)\)
2. if \(n\) is not large enough and the distribution is far from normal distribution : Non-parametric method

inference for population variance

Case 1) Estimating the population variance where \(X_i \sim N(\mu, \sigma^2)\), i.i.d. \(i=1, \cdots, n\) \(\frac{n-1}{\sigma^2}s^2 \sim \chi^2(n-1)\) where \(s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2\) point estimation of \(\sigma^2\) : \(\hat{\sigma^2} = s^2\) interval estimation of \(\sigma^2\) : \((\frac{(n-1) s^2}{\chi_{\alpha/2}^2(n-1)}, \frac{(n-1) s^2}{\chi_{1-\alpha/2}^2(n-1)})\)

Case 2) Comparing the two population variances where \(X_i \sim N(\mu_1, \sigma_1^2)\), i.i.d. \(i=1, \cdots, n_1\) independent from \(Y_i \sim N(\mu_2, \sigma_2^2)\), i.i.d. \(i=1, \cdots, n_2\) \(F = \frac{s_1^2/\sigma_1^2}{s_2^2/\sigma_2^2} \sim F(n_1-1, n_2-1)\) interval estimation of \(\frac{\sigma_1^2}{\sigma_2^2}\) : \((\frac{s_1^2}{s_2^2 F_{\alpha/2}(n_1-1, n_2-1)}, \frac{s_1^2}{s_2^2 F_{1-\alpha/2}(n_1-1, n_2-1)}) \iff (\frac{s_1^2 F_{1-\alpha/2}(n_1-1, n_2-1)}{s_2^2}, \frac{s_1^2 F_{\alpha/2}(n_1-1, n_2-1)}{s_2^2})\)

3. Correlation analysis

correlation

The correlation between two random variables \(X\) and \(Y\) is defined as \(\)\rho = corr(X, Y) = \frac{cov(X, Y)}{\sqrt{Var(X)Var(Y)}}\(\)

properties

\[-1 \leq \rho \leq 1\]
measures the intensity of linearity
\(\mid \rho \mid \approx 1 \Rightarrow\) high linear relationship

statistical inference on \(\rho\)

Let \(S_{xx} = \sum_{i=1}^{n} (x_i - \bar{x})^2\), \(S_{yy} = \sum_{i=1}^{n} (y_i - \bar{y})^2\), and \(S_{xy} = \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})\).

Statistical test on \(\rho\) \(H_0 : \rho = 0\) test statistic : \(T = \sqrt{n-2} \frac{r}{\sqrt{1-r^2}} \sim t(n-2)\) under \(H_0\) where \(r^2 = \frac{SSR}{SSTO} = \frac{S_{xy}^2/S_{xx}}{S_{yy}} = (\frac{S_{xy}}{\sqrt{S_{xx}} \sqrt{S_{yy}}})^2\)

proof Note that \(SSR = \sum_{i=1}^n (\hat{Y_i} - \bar{Y})^2 = \beta_1^2 \sum_{i=1}^n (X_i - \bar{X})^2\). \(T = \sqrt{n-2} \frac{r}{\sqrt{1-r^2}} = \frac{\sqrt{SSR}}{\sqrt{MSE}} \sim t(n-2)\) \(\because T^2 \sim F(1, n-2)\)

If \(H_0\) is rejected, it is said that there is a linear relationship between \(X\) and \(Y\) within the possibility of \(\alpha\).

Note \(H_0\) is more likely to be dismissed when \(n\) is large or \(\mid r\mid\) is large.

RA ch2 Review of Mathematical Statistics