Student
Hello, this question relates to the last 2 exercises during the previous class about hypothesis testing.
(The last 2 slides)
Exercise 1:
Dataset Quine of Lab 2, mean number of days absent greater than 15 ?
Pivotal statisitics T:
T=\frac{\widehat{\mu} - \mu_{0}}{\widehat{se}(\widehat{\mu})}
With standard error:
\widehat{se} = \widehat{\mu} / \sqrt{n} = \overline{X} / \sqrt{n}
Exercise 2:
proportion of voters supporting candidate A = 0.52
Pivotal statistic T:
T=\frac{\widehat{p_{A}} - 0.5}{se}
With standard error:
se=\sqrt{p_{A} * (1-p_{A}) / n}
In exercise 1, absent days, we used the sample mean (16.46, instead of 15) to compute the standard error.
In exercise 2, proportion of voters, we used the hypothetical mean (0.5, instead of 0.52) to compute the standard error.
Would you mind explaining the reason behind this choice ?
Thank you for your attention.
Phan Anh VU
Teacher
Thank you for your question Phan Anh: you're right, it's better to use in exercise 1 the value of \mu under H_0 as we know it! And it's more consistent with exercise 2.
Both values, the estimated value and the value of H_0 are correct to perform the test.
On the other hand, the only possibility to compute a CI is to estimate the parameter in the s.e. formula.
Is it OK now?
Thank you for allowing me to to specify this point.
Student
Thank you for clarifying, Mme Poursat.
Using the empirical value to estimate the standard error, could you tell me when should we use the unbiased and biased estimator of the standard deviation ?
Regarding normal population:
For the mean, we use the unbiased estimator:\frac{\overline{X} - \mu}{s/\sqrt n}
s^2 = \frac{1}{n-1}\sum (X_{i} - \overline{X})^2
For the variance, we use the MLE, which is the biased estimator: (n-1)*\widehat{\sigma}^2/\sigma^2
\widehat{\sigma}^2 = \frac{1}{n}\sum (X_{i}-\overline{X})^2;
Besides, when we have no idea about the law of the data, we need to estimate the population variance by the sample variance. Here, should we divide the sum of squared deviation \sum (X_{i}-\overline{X})^2 by n or (n-1) ?
I am confused about when to use the biased vs unbiased estimator.
Phan Anh VU
Teacher
In general settings, when the results are based on the approximate normal law (CLT for the sample mean or MLEs) using the biased or unbiased estimator does not matter, since they are both consistent estimators of the variance.
In Gaussian models (and only in Gaussian models), we have the following results:
\frac{\overline{X}-\mu}{s/\sqrt{n}}\sim t(n-1), \quad n\widehat{\sigma}^2/\sigma^2=(n-1)s^2/\sigma^2=\sum_i(X_i-\overline{X})^2/\sigma^2\sim \chi^2(n-1)
The first assertion is not true if you replace s by \widehat\sigma.
Is it clearer?
Student
Yes, thank you very much. It is clearer now when you make me realize that :
n * \widehat{\sigma}^2 = (n-1)s^2 = \sum (X_{i}-\overline{X})^2
I guess this also explains Question 3 of the Practice Test 3:
Pθ=N(θ,1).
H0: θ=0 vs H1: θ= 1.
rejection region R = {X > c}
Show that critical value c for test level 5% is c= 0.41, and type II error for test level 5% is 0.9%.
First, I tried with \frac{\overline{X} - \mu}{s/\sqrt n} \sim t(n-1);
This gives a different result.
Then, using \frac{\overline{X} - \mu}{\sigma/\sqrt n} \sim Normal(0,1), I have the correct result.
If I understand correctly:
Only for Gaussian population, and when the population variance \sigma^2 is unknown, we use Student t distribution and unbiased estimator s
\frac{\overline{X} - \mu}{s/\sqrt n} \sim t(n-1);
In all other cases, i.e. non normal population, or normal population with \sigma^2 given, we should use the normal approximation for the sample mean:
\frac{\overline{X} - \mu}{\sigma/\sqrt n} \sim Normal(0,1)
Could you confirm or correct me if I am wrong ?
Thank you again for answering several long questions of mine.