Confidence Interval

Hypothesis testing simply the idea that tests the sample estimator for a population: Mean, proportion or variance.

We know that confidence interval gives the interval for what are the possible values can be estimated object have.

In this case confidence interval can be calculated for different cases

1- Confidence interval for a population mean, when population variance is known

\begin{equation} \bar{x} \pm z_{a/2}\times \frac{\sigma}{\sqrt{n}}\end{equation}

\begin{equation} \bar{x} =\textrm{sample mean} \end{equation}

\begin{equation} z_{a/2} =\textrm{critical value} \end{equation}

\begin{equation} a =\textrm{significance level (1-significance level)} \end{equation}

\begin{equation} \sigma =\textrm{population standard deviation} \end{equation}

\begin{equation} n =\textrm{sample size} \end{equation}

So, for example, a company tries to estimate its mean production. It did sampling with 10 sample size, 200 mean and know that for all production the population variance is 5 and the confidence level is 90% then:

\begin{equation}\textrm{Upper Confidence Limit}=200+z_{a/2}\times \frac{5}{\sqrt{10}}=202.6007 \end{equation}

\begin{equation}\textrm{Lower Confidence Limit}=200+z_{a/2}\times \frac{5}{\sqrt{10}}=197.3993 \end{equation}

Thus, we can say that population mean is between 197.3993 and 202.6007 with 90% confidence

\begin{equation} 197.3993<\mu<202.6007 \end{equation}

2-Confidence interval for a population mean, when population variance is unknown

In this case, rather than using standard normal distribution, we have to use t-distribution.

\begin{equation}\bar{x}\pm t_{df,a/2}\times\frac{s}{\sqrt{n}}\end{equation}

\begin{equation} df=\textrm{Degrees of Freedom (n-1)}\end{equation}

For example, assume that from a 100 sample, it was collected that sample mean cost of being a student is $2000 for a month with 14 standard deviation for sample. Let's find the confidence interval for population mean cost of being a student with %95 confidence level.

\begin{equation}t_{20-1,0.025}=2.093024 \end{equation}

\begin{equation} \textrm{Upper Confidence Limit}=2000+t_{20-1,0.025}\times\frac{14}{\sqrt{100}}=2002.93 \end{equation}

\begin{equation} \textrm{Lower Confidence Limit}=2000-t_{20-1,0.025}\times\frac{14}{\sqrt{100}}=1997.07 \end{equation}

\begin{equation} 1997.07<\mu<2002.93\end{equation}

3-Confidence interval for a population proportion.

We know that proportions that aquired from population are binomially distributed. So we can find the variance of this sample as (Assume that x is discrete random variable which is has binomial distribution):

\begin{equation} \textrm{Population Proportion}=p \end{equation}

\begin{equation} VAR(x)=n\times p\times(1-p) \end{equation}

\begin{equation} \textrm{Sample Proportion}=\hat{p}=\frac{x}{n} \end{equation}

\begin{equation} VAR(\hat{p})=VAR(\frac{x}{n})=VAR(x)\times \frac{1}{n^2}= \frac {p\times (1-p)}{n} \end{equation}

So we can find the confidence interval for population proportion as:

\begin{equation}\sqrt{\frac{p\times(1-p)}{n}}\approx \sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\end{equation}

\begin{equation} \hat{p}\pm z_{a/2}\times S_\hat{p} \end{equation}

\begin{equation} \hat{p}\pm z_{a/2}\times \sqrt{\frac{\hat{p}\times(1-\hat{p})}{n}}\end{equation}

For instance, simple random sample of 200 from a total 1,380 colleges in a country maintained that 18 colleges use the text Statistics Made Difficult and Boring. Find a 95% confidence interval for the proportion of all colleges using this text. (Newbold, Carlson, & Thorne, 2022) (Question is taken from the book Statistics for Business and Economics)

In this case, because the ratio of sample size to population size is bigger than 0.05, we should multiply the standard deviation with finite population correction factor

\begin{equation} \textrm{Finite Population Correction Factor}=\frac{N-n}{N-1} \end{equation}

\begin{equation} 0.05\times N>n \rightarrow S_{\hat{p}}\times\frac{N-n}{N-1} \end{equation}

\begin{equation} \textrm{Confidence Interval}=\hat{p}\pm z_{a/2}\times S_{\hat{p}}\times\frac{N-n}{N-1} \end{equation}

\begin{equation} n=\textrm{sample size}=200 \end{equation}

\begin{equation} N=\textrm{Population Size}=1380 \end{equation}

\begin{equation} \hat{p}=\textrm{Sample Proportion}=\frac {18}{200}=0.09 \end{equation}

\begin{equation} S_{\hat{p}}=\sqrt{\frac{0.09\times 0.91}{200}}=0.0202 \end{equation}

\begin{equation} z_{0.05/2}=1.96 \end{equation}

\begin{equation} \textrm{Finite Population Correction Factor}=\frac{1380-200}{1380-1}=0.855 \end{equation}

\begin{equation} \textrm{Upper Confidence Limit}=0.09+1.96\times 0.0202\times0.855= 0.123 \end{equation}

\begin{equation} \textrm{Upper Confidence Limit}=0.09-1.96\times 0.0202\times 0.855= 0.056 \end{equation}

\begin{equation} 0.123>P>0.056 \end{equation}

4-Confidence interval for population variance

In this case, we have to use chi-square distribution with n-1 degrees of freedom, where

\begin{equation} \chi^2_{n-1}=\frac{S^2\times(n-1)}{\sigma^2}\end{equation}

I do not dive into where this equation comes from, but you can infer that it resembles the equation for standard normal and student's t distributions, which have the same purpose with this equation.

For certain probability, we show this as(where n=11, and probability is 0.1):

\begin{equation} \chi^2_{n-1,a}\rightarrow \chi^2_{10,0.1}=15.98718 \end{equation}

So area under curve will be found as:

\begin{equation} p(\chi^2>\chi^2_{10,0.1})=p(\chi^2>15.98718)=0.1\end{equation}

So, we can derive the probability equation for the confidence interval

\begin{equation} p(\chi^2_{n-1,1-a/2}>\chi^2_{n-1}>\chi^2_{n-1,a/2})=1-a\end{equation}

\begin{equation} p(\chi^2_{n-1,1-a/2}>\frac{S^2\times (n-1)}{\sigma^2}>\chi^2_{n-1,a/2}) \end{equation}

\begin{equation} p(\frac{S^2\times(n-1)}{\chi^2_{n-1,1-a/2}}>\sigma^2>\frac{S^2\times(n-1)}{\chi^2_{n-1,a/2}}) \end{equation}

So:

\begin{equation}LCL=\frac{S^2\times(n-1)}{\chi^2_{n-1,a/2}}\;\;\;UCL=\frac{S^2\times (n-1)}{\chi^2_{n-1,1-a/2}} \end{equation}

Assume that 26 bags that include balls which have different colors are sampled to estimate the variance of colors. The sample variance found is 6.62. Find a 90% confidence interval for the population variance.

\begin{equation} UCL=\frac{6.62\times 25}{\chi^2_{25,0.95}}=\frac{6.62\times 25}{14.61141}=11.32677\end{equation}

\begin{equation} LCL=\frac{6.62\times 25}{\chi^2_{25,0.05}}=\frac{6.62\times 25}{37.65248}=4.39546\end{equation}

\begin{equation} 11.32677>\sigma^2>4.39546\end{equation}

Which means that population variance is between 11.32677 and 4.39546 with 90% confidence.

Random Statistics

Confidence Interval

No comments:

Post a Comment