Simple Interactive Statistical Analysis
Hypergeometric |
Input.
Input for the mean or average 'blue' box can be any positive (decimal) value. Input for the number 'green' box must be an integer value, a whole positive number without decimals. Similar, the sample size and population size should be integer numbers, whole positive number without decimals.
A windows version of the hypergeometric procedure is available here
Explanation.
The hypergeometric distribution is used for calculating probabilities for samples drawn from relatively small populations and without replication. This means that an item's chance of being selected increases on each trial. The hypergeometric distribution is often used in zoology to study small animal or plant populations. Your hypothesis was that you would find a proportion of 'u' (fill in the population proportion in the top box) occurrences of a phenomenon in a sample sized 'n' (third box). In fact what you found was 'x' occurrences of the phenomenon (second box). Lastly you give the size of the population, 'N', in the bottom (fourth) box. The program gives you the probability for a number of values of 'x'.
For sampling with replication (i.e. sampling whereby an individual or item which is 'drawn' and studied is then put back into the population or 'pool' so that it has an equal chance of being 'drawn' and studied again), the Binomial should be used. If the population size is relatively large the binomial should also be used; calculating the Hypergeometric distribution with a large population is very computing intensive and may not produce satisfactory results. If the population is infinitely large, there is no difference between the hypergeometric distribution and the binomial.
A number of items based on the normal approximation of the hypergeometric distribution have been added. It is then possible to construct a confidence interval around the difference in proportions, or numbers, which is tested. If the confidence interval is small we are very sure that there is an important difference, if the confidence interval is large then we are not very sure. If the value zero is between the upper and the lower values of the confidence interval this means that the difference between the two numbers is not statistically significant.
The normal deviation, the standardized difference, can be used to estimate p-values as a direct alternative to using a hypergeometric procedure. It is not really relevant in our case because we already have the superior hypergeometric estimate itself. The standard error in the estimates provided is based on the main hypergeometric assumption, i.e. that there was a sampling procedure without replication in a relatively small population. If sampling was with replication or if the population was large, use the binomial, as explained above.
Note. The hypergeometric procedure is a full integer procedure, the input is considered to consist of the number positive in the population, the number of positive observations in the sample, the sample size and the population size. All these numbers should be integer values, whole numbers without decimals. This versions of the hypergeometric procedure does not require an expectation which will, if multiplied with the population size, produce an integer value relating to the number in the population. So calculations can be based on a decimal number positive in the population, which is a theoretical problem, as it is impossible to have, for example, 25.8 black swans flying around in the real world. If the expectation in the input box does refer to a decimal number of positives in the population, the program might produce illogical result. This is particularly a problem with a small population size. The program now gives you a suggestion as to a reasonable alternative expectation you can use if your input is logically incorrect. An alternative might be to adjust your population size a bit or a combination of adjusting the expectation and the population size. Simulations showed that it is mostly better to indeed make an adjustment to unlogical data!
Please study the hypergeometric distribution further by using the Hypergeometric spreadsheet
TOP of page