Sampling Distribution and Central Limit Theorem
The central limit theorem indicates that the sum of n random variables from any probability distribution will be approximately normally distributed if n is large enough. This makes sense, because the sample size determines the efficiency and reliability of a sample which is sampled from the population. Most of the time sampling is performed due to the impossibility of observing the total population.
From this on, sampling distribution means that selecting samples from population which have certain sizes. And after that the distribution of means of these samples is the issue of sampling distribution. This distribution has a standard deviation which controls the distribution of means of samples which are extracted from population:
\begin{equation}\sigma^2_{\bar{x}}=\frac{\sigma^2}{n}\;\;\;\;\;\;\;\;\;\;\;\;\; \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}\end{equation}
\begin{equation}\sigma^2=\textrm{Population Variance}\;\;\;\;\sigma^2_{\bar{x}}=\textrm{Variance of Sampling Distribution}\end{equation}
As you can infer, when the sample size of samples that are acquired from population increases, the standard deviation of sampling distribution diminishes. Which tells us that means of samples converges to each other as the sample size increases and we will see that as this increase in sample size would make the sampling distribution approxiametly normally distributed.
In order to prove the intiutive specifications of central limit theorem, I collected one of the exam results. I created a loop which assigned each gap with the mean of a sample from this exam population. Even though the process of randomness is not necessarily 100% random (psuedo random), the process can be considered as a trivial factor.
One of the loops includes means of samples which have 50 observations. The distribution of sample means can be seen as:
The means of samples do not seem to be normally distributed (the line represents a normal distribution to compare the histogram) even though we have a large sample size.
But when we increase the sample size to 500 we will observe a much more accurate result when we compare the normal distribution line:
The distribution of histograms fits the normal distribution curve, which makes the claim of central limit theorem true. Another thing to consider is, the sampling distribution in which sample size is equal to 50 has a wider curve compared to the curve of sample size of 500. This also proves the logic behind the standard deviation of sampling distribution, which its equation was given.
So from these informations, we can calculate the probability of having a sample which has a certain mean, where population mean, standard deviation and sample size are given. This procedure can be applied to not only to the parameter of mean but sample proportion and variances as well.
For a sampling distribution of sample means, assume that mean height of students in university is 1.8 and has a 0.2 standard deviation. And you select 20 students. What is the probability of this sample has mean of less than 1.7?
\begin{equation}u_x=1.8\;\;\;\;\;\bar{x}=1.7\;\;\;\;\;\;n=20 \end{equation}
\begin{equation}p(1.7>\bar{x})=\;? \end{equation}
\begin{equation}Z= \frac{\bar{x}-u_x}{\frac{\sigma}{\sqrt{n}}}=\frac{1.7-1.8}{\frac{0.2}{\sqrt{20}}}=-2.236068\end{equation}
\begin{equation}p(\frac{1.7-1.8}{\frac{0.2}{\sqrt{20}}}>\frac{\bar{x}-u_x}{\frac{\sigma}{\sqrt{n}}})=p(-2.236068>Z)=0.01267366\end{equation}
No comments:
Post a Comment