Confidence interval
In statistics, a confidence interval, abbreviated as CI,[1] is a tool that people use when they collect data in order to estimate a certain parameter, such as the mean (average) of a population.[2] A confidence interval gives a range of values that tells others how good we think our estimate is.
Each confidence interval depends on certain properties of the sample(s) that we use to make it. For example, a confidence interval that is made from a sample of 5000 people is going to be much better than a confidence interval made from a sample of only 5 people.
We usually give confidence intervals as a percentage, such as 95%. These percentages are called confidence levels.
Meaning of the term "confidence"[change | change source]
The term confidence has a similar meaning in statistics, as in common use. In common usage, a claim to 95% confidence in something is normally taken as indicating near certainty. In statistics, a claim to 95% confidence simply means that the researcher has seen one possible interval from a large number of possible ones, from which 19 out of 20 intervals contain the true value of the parameter.
Practical example[change | change source]
![A factory assembly line fills margarine cups to a desired 250g +/- 5g](https://upload.wikimedia.org/wikipedia/commons/7/7a/Margarinefilling.png)
A machine fills cups with margarine. It is adjusted so that the content of the cups is 250g of margarine. As the machine cannot fill every cup with exactly 250g, the content added to individual cups shows some variation, and is considered a random variable X.
This variation is assumed to be normally distributed around the desired average of 250g, with a standard deviation of 2.5g. To check if the machine is adequately calibrated, a sample of n = 25 cups of margarine is chosen at random, and the cups are weighed. The weights of margarine are X1, ..., X25, a random sample from X.
To get an impression of the expectation
The sample shows actual weights x1, ...,x25, with mean:
If we take another sample of 25 cups, we could easily expect to find values like 250.4 or 251.1 grams. A sample mean value of 280 grams, however, would be extremely rare if the mean content of the cups is in fact close to 250g.
There is a whole interval around the observed value 250.2 of the sample mean within which, if the whole population mean actually takes a value in this range, the observed data would not be considered particularly unusual. Such an interval is called a confidence interval for the parameter
To calculate such an interval, the endpoints of the interval have to be calculated from the sample, so they are statistics, functions of the sample X1, ..., X25, and hence are random variables themselves.
In our case, we may determine the endpoints by considering that the sample mean X from a normally distributed sample is also normally distributed, with the same expectation
which depends on the parameter
The number z follows from the cumulative distribution function:
and we get:
This might be interpreted as: with probability 0.95, we will find a confidence interval in which we will meet the parameter
and
This does not mean that there is 0.95 probability of meeting the parameter
![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/8f/NYW-confidence-interval.svg/300px-NYW-confidence-interval.svg.png)
As the desired value 250 of
The calculated interval has fixed endpoints, where
The figure on the right shows 50 realisations of a confidence interval for a given population mean
Related pages[change | change source]
References[change | change source]
- ↑ "List of Probability and Statistics Symbols". Math Vault. 2020-04-26. Retrieved 2020-10-14.
- ↑ "Confidence Intervals". www.stat.yale.edu. Retrieved 2020-10-14.