Before knowing what a Confidence Level is, it is imperative to
understand what a Confidence Interval is. It is an estimate computed from
the statistic of the data, a range of possible values for an unknown parameter
(e.g. mean of a distribution, standard deviation).
Let us take a practical example here:
In general, it is
an interval for an unknown population parameter based on the sampling
distribution of the estimator.
Read the below
few lines before we get to the calculation part of it.
A confidence interval is how much uncertainty there is with any particular statistic. Confidence intervals have a margin of error. It tells you how confident you can be that the results from a poll or survey reflect what you would expect to find if it were possible to perform a survey for the entire population.
Intrinsically, confidence interval is related to the confidence
levels.
Confidence Level and Confidence
Interval
Confidence level is expressed as a percentage (for example, a 90%
confidence level). It means that should you repeat an experiment or survey over
and over again, 90 percent of the time your results will match the results you
get from a population (in other words, your statistics would be sound!).
Confidence intervals are your results…usually numbers.
For example, you survey a group of students to see how many hours they study in a week. You test your statistics at the 99 percent confidence level and get a confidence interval of (8, 14). That means you think that they study between 8 and 14 hours a week. You are super confident (99% is a very high level!) that your results are sound, statistically.
The confidence level is associated with alpha (value you are free to choose). Supposing an ɑ of 10% the corresponding Confidence Level will be (1- ɑ) i.e. 90%.
For example, you survey a group of students to see how many hours they study in a week. You test your statistics at the 99 percent confidence level and get a confidence interval of (8, 14). That means you think that they study between 8 and 14 hours a week. You are super confident (99% is a very high level!) that your results are sound, statistically.
The confidence level is associated with alpha (value you are free to choose). Supposing an ɑ of 10% the corresponding Confidence Level will be (1- ɑ) i.e. 90%.
Let us take a practical example here:
Suppose a cold drink machine is adjusted to fill up bottles of
exactly 1 L. But the machine cannot fill exactly 1 L in each bottle and shows
variation with some variable X. This variation is assumed to be normally
distributed an average amount of 1 L with a standard deviation of 25 ml. To
check whether machine is correctly calibrated we take a sample of 25 bottles.
μ = 1
Assuming, x̅ (sample mean) = 0.997
n
= 25
If
we take more samples, the expected means could be around 1.02 , 0.99, 0.98.
Further,
in our case we may determine the confidence interval by considering that the sample mean x̅ from a
normally distributed sample is also normally distributed, with the same
expectation μ,
but with a standard error of:
σ/√n = 25/ 5 = 5 ml
Using the formula:
Z = ( x̅ - μ)/σ/√n (if you don't know
what this is)
As the Z follows a normal
distribution and for a two-tailed test with ɑ = 0.05
P(Z) = 1 - ɑ / 2 =
0.975 (How do we get
this)
or Z-statistic = 1.96
Now lower point:
LP = x̅ - 1.96*(σ/√n) = 997 - 1.96* 5 =
987.2
and upper point:
UP = x̅ + 1.96*(σ/√n) = 997 + 1.96* 5 = 1006.8
This
means that every time the measurements are repeated, there will be another
value for the mean x̅ of the sample. In 95% of the cases μ will
be between the endpoints calculated from this mean, but in 5% of the cases it
will not be.
Few,
very commonly use Z-statistic(z*) values:
90%
|
1.645
|
95%
|
1.96
|
98%
|
2.326
|
99%
|
2.576
|
Below is a plot for a standard normal distribution showing the amount of information between each +/- z-score. How to calculate the Z-score for a standard normal distribution and how do we get the above percentages , that will be in upcoming posts.
There is a common misconception regarding the Confidence interval and level.
Also, recommend reading two tailed and one-tailed tests.
There is a common misconception regarding the Confidence interval and level.
A
90 % confidence level does not mean that for a given realized interval there is
a 90% probability that the population parameter lies within the interval (i.e.
a 90% probability that the interval covers the population parameter). Once an
interval is calculated, this interval covers the parameter value or it does
not, it is no longer a matter of probability. The 90% probability relates to
the reliability of the estimation procedure, not to a calculated interval.
In
short, in 90% of the samples the interval estimate will contain the true
population parameter.
Comments
Post a Comment