Manual for the distributions statistical computer progam.

Distributions

Download
All Check Box
Approximation Check Box
Binomial
Catch Recatch
Confidence Interval
Descriptives
Edit Menu
Exact Check Box
Expectation Value
Fieller
Help
Hypergeometric
Input box
License
Limitations
Negative Binomial
Negative Binomial
Negative Binomial/CRC Check box
Observed Number
Options
Output Field
Poisson
Population Size
Rate Ratio
Sample Size
Variance Covariance check box

TOP of page
Catch Recatch

Catch-Recatch is used to estimate the size of a population in the wild. First a number of animals or individuals are caught and marked. These individuals are then put back into the population and given the chance to mix in again. Then a random sample is drawn from the population and the number of marked individuals is counted. By comparing the number of marked individuals (i.e. positive observations) in relation to the sample size and the number of marked individuals in the population as a whole, the size of the population can be estimated.

Input the number of marked individuals in the top box (expectations), the number of positive observations in the second box (observations) and the sample size in the third box (sample size). The number in the expectations box should be larger than the number in the observations box. If you check the Negative Binomial/CRC box and the CI box the program will calculate the most likely population size and the Exact Hypergeometric Confidence Interval around the population size. Note that this procedure is based on inverting the Hypergeometric Distribution, a complex process which is very computing-intensive. It may not always work successfully. Therefore, if you check the Negative Binomial/CRC box and the approximation box, the program will calculate the most likely population size, its variance, and the estimated Hypergeometric Confidence Interval around the population size. This procedure will work more reliably but less precisely.

TOP of page
Fieller

Fieller confidence intervals around a rate ratio are produced if an expected value and an observation value are given to the program, and the approximation box is checked. Both the expected value and the observation value can be type real, positive numbers with decimals.

The Fieller Confidence Interval has two applications:

1) The Fieller estimate is a method to estimate the approximate confidence interval around a rate ratio when there is error in both the observed and the expected value and the number of cases is relatively large. For very large numbers, the Fieller provides a good method to approximate the results of the exact Binomial/Incomplete Beta method, which is also presented in the module. The Incomplete Beta method is the more suitable exact method for estimating the confidence interval around a rate ratio with error in both the observed and the expected values, but the method may not work well for a very large number of observations.

2) The Fieller method makes it possible to take covariance between the observation and the expectation into account. This would be relevant if the observations are a subset of the data on the basis of which the expectation was calculated. For example, the Standardize Mortality Ratio is often calculated by applying data of the national population to the local population. The mortality in Hampshire could be compared with the national mortality by using an expectation calculated by applying the national age-specific death rates (the standard) to the population of Hampshire (the index). A major crash on the motorway in Hampshire would show up in both the observation of the number of deaths in Hampshire and in the national death rates, which are used to calculate the expectation. Silcocks (1994) proposes the covariance 'q' parameter for the Fieller in such a case to be: q=Sum(d(i)*n(i)/N(i))/d-tot whereby, d(i) number of deaths in the index population in the i-th age band, n(i) number of individuals in the i-th age band of the index population which are part of the standard population, N(i) number of individuals in the i-th age band in the standard population, d-tot, total number of deaths in the index population. The q-covariance parameter is entered in the program in the right bottom (Co-)Variance box. The default setting of this parameter is zero.

The Fieller will give an estimate only if the expected value is larger than one. Thus, real cases or a mean of real cases is considered and not a proportion.

Silcocks P. Estimating confidence limits on a standardized mortality ratio when the expected number is not error free. Journal of Epidemiology and Community Health 1994; 48:313-317.

TOP of page
Negative Binomial (marginal binomial)

The Negative Binomial Distribution as the marginal of the binomial is produced if an expected value, an observation value and a number of cases are given by the user, and if the Negative Binomial and the exact box are checked. The expected value must be a proportional value between zero and one. The observation value can be type real. The number of cases value can also be type real, but then it will be truncated to the value of the nearest integer smaller than the given real number.

The distribution is also known as the Pascal Distribution or the Polya Distribution. The Geometric Distribution is a special case of the Negative Binomial with the number positive, in the second blue box, being '1' (one).

The negative binomial does basically the same thing as the Binomial, except that now we are asking about the probability of a particular sample size, given that we have found 'x' results to be positive (or white, or car crashes), whereas we had expected to find 'u' results to be positive. Input is the same as for the binomial, but now the output expresses a change in the number of cases and not, as in the binomial, the number found positive. It is not possible to fill in a number in the first (expected value) box; it has to be a proportion. The reason is that the bottom box (the number of cases) is not fixed and there is therefore no fixed relationship between the expected proportion and the expected number (the expected number will increase with the sample size, if the expected proportion stays the same).

If you are used to other discrete distributions, the output may come as a surprise. Usually there is a probability for a zero value: a phenomenon not being observed. This time the counting does not start at the '0' (zero) value but at the first meaningful value. The first meaningful value is that in which the sample size and the observed number of positive responses are equal. This is because you must have a sample size of at least as large as the number of positive responses observed.

An exact Negative Binomial confidence interval for the sample size (the value in the third box) based on the number of positive responses observed, is given if the C.I. box is checked. There is no point in giving an expected value for this exact procedure; only the observed value and the sample size are taken into consideration. The usual estimate for the confidence interval for the sample size is produced if the negative binomial and the estimate box are checked. Because this confidence interval is not very good, SISAs own estimate is also given. This estimate is based on inverting the well-known normal approximation of the Binomial. For the estimation procedures you can give an expected proportion for the likelihood of a positive observation on each trial; this proportion would normally be estimated from the data but an extraneous expectation can also be considered. However, beware, this is not a tried and tested procedure.

TOP of page
Negative Binomial (For rare events)

The Negative Binomial Distribution for rare events is produced if the program detects that the input is for a Poisson distribution because the user gives only an expected value and an observation value. Both the expected value and the observed value can be any positive value. A variance value also has to be given in the (co-)variance -bottom right- box and this value will mostly be larger in size than the value for the expectation. If the mean and variance are exactly the same the program will give you the Poisson distribution.

This version of the negative binomial is a generalization of the Poisson, as used to study the distribution of accidents and events at the individual level. For the Poisson it is assumed that the chance of having an accident or a disease is randomly distributed: all individuals have an equal chance, of having one, two or more accidents. However, this assumption may be incorrect; you may observe that relatively more people have a higher number of accidents than the Poisson predicts. Mathematically the correctness of the assumption can be checked by seeing if in the Poisson the variance and the mean are equal. If the variance is larger than the mean, the assumption is incorrect.

The negative binomial does not assume randomness: there is a possibility of proneness, i.e. certain groups of individuals in the population have a higher chance of having accidents or diseases than others. A third parameter is now included, the variance of the distribution. The variance can be interpreted as a factor that expresses the level of proneness. The larger the variance is relative to the mean, the higher the level of proneness in the population. Note that the variance is an expectation value, it is related to the expected value. The variance is the square of the standard deviation. Lastly, the mean to which the variance is related is in this case always a mean, even if the value of the mean is between zero and one. There is therefore no mathematical relationship between the variance and the mean, as is the case with proportions.

One of the problems with the negative binomial is that there does not seem to be a clear meaning which can be easily given to the variance (Arbous AG, Kerrich JE, 1951). Thus, after fitting the empirical distribution to the Negative Binomial, and estimating the variance, as a measure of accident or disease proneness in the population, you are confronted with the question of what this measure means for individuals or groups in the population, or the development of policy or decisions.

An exact Negative Binomial confidence interval for the observed value cannot be given.

Arbous AG, Kerrich JE. Accident Statistics and the Concept of Accident-Proneness. Part I: A Critical Evaluation. Part II. The Mathematical Background. Biometrics 1951;341-433.

TOP of page
Rate Ratio

The rate ratio analysis is produced if the input consists of an expected and an observation value, and the approximation box is checked. Both the expected value and the observation value can be type real, positive numbers with decimals.

This module was inspired by two papers, one by Liddell and one by Silcocks. It concerns estimating the confidence interval for a rate-ratio, in practice this will often be the Standardized Mortality Ratio. It concerns the estimation of events in a constant domain while the denominator is very large. Often it will concern phenomena in demography, epidemiology or biology. For those who work with events in smaller samples this module is less interesting and the risk-ratio or odds-ratio will probably be more appropriate. The risk-ratio and the odds-ratio are implemented on the SISA website by the procedures t-test and two by two tables, and in the SISA-tables program.

In making comparisons our instinct is to use subtraction. For example, in comparing the number of pairs of breeding rare birds between years, we say that the number this year is 20 more than last year. It is a good year. A good thing about doing it this way is that it is easy to understand, you can easily imagine the breeding birds. The disadvantage is that it doesnt consider the scale problem; it makes a big difference to have 20 more pairs of breeding birds if the usual number is 50, compared with a usual number of 500. That is why we prefer the use of ratio's, or division. We take the number of birds we observe, this years birds, and compare this with our expectations on the grounds of previous experience. Thus, if we expect 500 breeding pairs, and 520 pairs are observed this year, than we have an increase of 4%, 520/500=1.04 (*100). However, if we expect 50 breeding pairs, and observe 70 pairs, the increase is about 40%, 70/50=1.40. The next step is to use the generalizing statistical approach. This year, the number of breeding pairs of rare birds is 40% as large compared with the average in the usual years, within a certain margin of confidence. The confidence interval given in the output gives you an impression of the precision of the estimate. If the value one is not included in the confidence interval the result is said to be statistically significant. There is a difference between the number of birds this year and last year, the difference is not solely due to chance fluctuation, i.e., it is not caused by the number of breeding pairs going up and down a bit between years.

The module presents various approximations for the confidence interval for the rate ratio. First, the exact confidence interval by way of a Chi-square transformation of the Poisson is given. Liddell discusses this method. The way this is implemented results in an exact estimation of the Poisson confidence interval of about four significant digits precision. In case the number of events comes above 80 the precision will rapidly decrease and it might be a better idea to use the Poisson process approximation, also discussed by Liddell. The program issues a warning when that is required.

Silcocks pointed out that an important assumption, that the expectation is theoretical and error free, is not met when calculating the SMR in epidemiology and demography. The age specific rates in the standard population on which the calculation of the expectation is based are empirical observations, which will show random fluctuation. Also, in the breeding bird example, it is correct to use the Poisson to compare this years breeding birds with the usual number, a theoretical concept. However, it is not correct to make the same comparison between two years, using last years observation as the expectation. Silcocks proposes an exact procedure based on the Binomial/Incomplete Beta to estimate the confidence interval considering error in the expectation. Silcocks ideas are implemented here although technically it works slightly different. The procedure provides a binomial confidence interval around the rate ratio over a very large range of numbers and with about ten-digit precision. The Fieller procedure can be used if there is correlation between the expected and the observed numbers. Neither procedure works if the expectation is less than one.

SISA's usual Chi-square distribution is used to do the exact Poisson confidence interval. SMR-Exact is a demanding procedure. Precision is limited to four digits for expectations under about 80 and should not be used above that number. SISA's usual Binomial procedure is used for the exact Binomial confidence interval. Here the choice is between precision and speed, the precision can be set using the ExactCI option. Both procedures require the observed number to be an integer. If it's not an integer the procedures will echo the values for the nearest integer less than the observed value. You can use real observations if you want, most of the approximations will handle those. Real value expectations are treated as such with high precision in all modules, exact and approximations.

Lastly, the Walds approximation is discussed by Rothman and Greenland, the square root, the normal deviate and the Poisson process by Liddell.

Liddell FDK. Simple exact analysis of the standardized mortality ratio. Journal of Epidemiology and Community Health 1984;38:85-88.

Rothman KJ, Greenland S. Modern Epidemiology. Philadelphia: Lippincott-Raven, 1998.

Silcocks P. Estimating confidence limits on a standardized mortality ratio when the expected number is not error free. Journal of Epidemiology and Community Health 1994;48:313-317.

TOP of page
Hypergeometric

User input is recognized as a request for a hypergeometric analysis if all four input boxes are used, with the highest number in the population box at the bottom. Input in the top expectation box should be a proportional value, between zero and one. If a larger number is given, it will be divided by the number of cases in the sample, as given in the third box. There is a preference for the result of this division to be an integer value. If this is not the case, the program will suggest a correction.

The hypergeometric distribution is used for calculating probabilities for samples drawn from relatively small populations and without replication. This means that an items chance of being selected increases on each trial. This distribution is often used in zoology to study small animal or plant populations. Your hypothesis was that you would find a proportion of 'u' (fill in the population proportion in the top box) occurrences of a phenomenon in a sample sized 'n' (third box). In fact what you found was 'x' occurrences of the phenomenon (second box). Lastly you give the size of the population, 'n', in the bottom (fourth) box. The program gives you the probability for a number of values of 'x'.

For sampling with replication (i.e. sampling whereby an individual or item which is 'drawn' and studied is then put back into the population or 'pool' so that it has an equal chance of being 'drawn' and studied again), the Binomial should be used. If the population size is relatively large the binomial should also be used; calculating the Hypergeometric with a large population is very computing intensive and may not produce satisfactory results. If the population is infinitely large, there is no difference between the hypergeometric distribution and the binomial.

A number of items based on the normal approximation of the hypergeometric have been added and will be produced if the user checks the approximation box. It is then possible to construct a confidence interval around the difference in proportions, or numbers, which is tested. If the confidence interval is small we are very sure that there is an important difference, if the confidence interval is large then we are not very sure. If the value zero is between the upper and the lower values of the confidence interval this means that the difference between the two numbers is not statistically significant.

The normal deviation, the standardized difference, can be used to estimate p-values as a direct alternative to using a hypergeometric procedure. It is not really relevant in our case because we already have the superior hypergeometric estimate itself. The standard error in the estimates provided is based on the main hypergeometric assumption, i.e. that there was a sampling procedure without replication in a relatively small population. If sampling was with replication or if the population was large, use the binomial, as explained above.

An exact hypergeometric confidence interval for the observed number (the value in the second box) is given if the C.I. box is checked. There is no point in giving an expected value for this exact procedure. Only the observed value, the sample and population size are used in the calculations. An exact hypergeometric population size is given if the C.I. and the CRC (Catch ReCatch) box are checked. See the discussion under CRC.

TOP of page
Poisson

Poisson analysis is done if the user gives an expected value and an observation value and checks the exact box. Both the expected value and the observation value can be type real, positive numbers with decimals.

The Poisson distribution has two applications:

1) It can be used as an alternative to the Binomial distribution in the case of very large samples. Your hypothesis was that you would find 'x' (the top cell) occurrences of a phenomenon, whereas in fact you found 'n' (the second input cell). The phenomenon might be the number of cases of a rare disease, the number of accidents on a busy junction, the number of stoppages on a production line, or the number of ethnic children at a football match. You want to test the assumption that environmental and other factors influencing the phenomenon were constant between observation periods. The program echoes the point probability and the probability of there being 'n' or more occurrences of a phenomenon given your expectation of 'x' occurrences. For the Poisson you do not need to give a sample size. If the sample size is known, it is generally preferable to use the Binomial.

The main differences between the Poisson and the binomial distribution is that in the binomial all eligible phenomena are studied, whereas in the Poisson only the cases with a particular outcome are studied. For example: in the binomial all cars are studied to see whether they have had an accident or not, whereas in the Poisson only the cars which have had an accidents are studied.

2) The Poisson can also be used to study how 'accidents' or 'malfunctions' or the chance of winning the lottery never, once or more than once, are distributed on the level of a population. It gives you the chance of an individual person or object meeting with one, two or more accidents or winning the lottery once, twice or more often. People may have one, two, three, or more accidents during a certain period of time. The Poisson distribution tells you how these chances are distributed if having one 'accident' has no influence on the chance of having another accident, i.e. the victim is put back into the population immediately after an 'event'. Mean or incidence is the number of accidents divided by the size of the population and is given to the program in the top expectation box. Note that although your calculation may result in a value between zero and one, this value is not a proportion but a true mean. You would get a true proportion if you divide the number of people who had an accident by the number of people (For a discussion of the relationships between these numbers see Uitenbroek 1995). In the second observed box is given the number of accidents you want to study. If you give 10 in the observed box the output gives you the proportion of the population who had '0' (zero) accidents, the proportion who had '1' (one) accident, the proportion who had '2' (two) accidents etc., up to 10. The cumulative distribution tells you the proportion that had '1' or more accidents, '2' or more etc.

One assumption in this application of the Poisson is that the chance of having an accident is randomly distributed: every individual has an equal chance. Mathematically this is expressed in the fact that the variance and the mean for the Poisson are equal. A good way to check if this assumption that individuals have an equal chance of having the trait is correct, is to compare the variance of an (accident) distribution with its mean. If the variance is larger, then the assumption was not correct. The Negative Binomial has been implemented to provide an alternative for the Poisson in the case of a non-random distribution.

Exact and normal approximations of the Poisson distributed confidence interval of the observed number are given. The exact Poisson confidence interval for the observed number (the value in the second box) is given if the C.I. box is checked. There is no point in giving an expected value for this exact procedure; only the observed value is taken into consideration.The normal approximation of the confidence interval is given if the approximation box is checked. If you divide the observed number by the expected number, you will get the Rate Ratio. A full analysis of the rate ratio is also provided if a Poisson approximation is requested.

Uitenbroek DG. The mathematical relationship between the number of events in which people are injured and the number of people injured. British Journal of Sports Medicine 1995: 126-128.

TOP of page
Binomial

A table of binomial values is produced if the program detects that the input is for a binomial distribution, i.e. if an expected value, an observation value and a number of cases studied are given by the user, and the exact box is checked. The expected value will usually be a proportion between zero and one, the number of positive observations and the number of cases (i.e. the sample size) will be integers, whole positive numbers. They can be type real, in which case these values will be rounded to the nearest integer. When the expected value is above one it will be divided by the sample size.

The binomial is probably the best known of the discrete distributions and a discussion of its properties can be found in most introductory statistics books (Blalock HM, 1960; Wonnacott TH, Wonnacott RJ, 1977). The distribution gives you the likelihood of finding 'x' failures (or white or female or large or tail or accidents, only your imagination limits you), as opposed to success (or black or male or small or heads or cars coming by which didn't have an accident). Your findings are the results of having done 'n' experiments, having made 'n' observations, or having studied a sample of size 'n'. You expected to find that an average of 'u' in your sample would have been failure, for example white or large. 'x', the number observed as positive, is what changes in the output box. 'u', the expected proportion of occurrences, is given in the top box, and 'n' , the sample size or number of experiments, is given in the third box.

The normal distribution, or z-distribution, is often used to approximate the binomial and data for this statistic is presented if you check the approximation box.. However, if the sample size is very large the Poisson distribution is a philosophically more correct alternative for the Binomial than the normal distribution. One of the main differences between the Poisson distribution and the binomial distribution is that in the binomial all eligible phenomena are studied, whereas in the Poisson only the cases with a particular outcome are studied. For example: in the binomial all cars are studied to see whether they have had an accident or not, whereas in the Poisson only the cars which have had an accidents are studied.

If you know the number of outcomes and the number of expected outcomes, and you would like to determine how likely it is that a particular size of a sample, or a particular number of experiments, would have produced this result, you can use the Negative Binomial. In the case of samples drawn without replication from small populations the Hypergeometric distribution should be used. The binomial test assumes that the expectation is error free, i.e. that it is a value known with certainty. Often it will be a theoretical or a population value. If the expected value is not error free it is better to construct a two by two table and to do an exact Fisher test or use a Chi-square or t-test as approximation. These tests are implemented on the SISA website and in the SISA-Table module for Windows.

An exact binomial confidence interval for the observed number (the value in the second box) is given if the C.I. box is checked. There is no point in giving an expected value for this exact procedure; only the observed value and the sample size are taken into consideration and only these should be given.

A number of items based on the normal approximation of the binomial can be obtained by checking the approximation box. It is then possible to construct a confidence interval around the difference in proportions, or numbers, which is tested. If the confidence interval is small we are very sure that there is an important difference; if the confidence interval is large then we are not very sure. If the value zero is between the upper and the lower value of the confidence interval, this means that the difference between the two numbers is not statistically significant.

The normal deviation, the standardized difference, can be used to estimate p-values as a direct alternative to using a binomial procedure. However, remember that exact tests are superior to normal approximations.

Blalock HM. Social Statistics. New York: McGraw-Hill,1960.

Wonnacott TH, Wonnacott RJ. Introductory Statistics. New York: John Willey, 1977.

TOP of page
Exact Confidence Interval

Exact Confidence Intervals for the Poisson, Binomial, Hypergeometric and the Negative Binomial, are given if the CI box is checked. The exact confidence intervals are obtained by inverting the procedures used in the exact procedures for estimating p-values and obtaining the appropriate upper and lower confidence value for the observed number in the case of the Poisson, Binomial and Hypergeometric distribution, or the sample size in the case of Negative Binomial distribution. It is further possible to calculate a confidence interval for a population size on the basis of the Hypergeometric distribution. The calculations are very computing intensive, particularly for the hypergeometric distribution. The algorithm iterates to a solution with a precision that is default set at 0.000001 for the upper and the lower bound respectively. If you want more precision set this to a lower value (for example: 0.00000001) using the Exact option in the options menu. If you want more speed change the precision to a higher value (such as 0.001).

TOP of page
Input box

The input boxes are the four fill-in boxes on the left of the form. They will accept real type data. This data should always be positive. The boxes should only contain numbers and the associated notation, so decimal dots and the letter E for a mantissa are accepted, but no other characters.

Using extended real type numbers allows for numbers up to 1.0^E+4932 to be input. However, if you input such a large number, you should not expect to get a meaningful output. See the discussion on the Limitations page. The input will not always be treated as a real type. The expectation will be used as a real type in all procedures. The observed number, sample size and population size will often, but not always, be treated as an integer, usually the largest integer smaller than the real number given. To be sure, give preference to observed numbers, sample size and population size in the form of integers.

The input boxes support editing operations such Ctrl-C=Copy, Ctrl-V=Paste and Ctrl-X=Cut-and-Copy. Beware of leading or trailing spaces and other characters when pasting data into an input box. The continental European decimal comma is not accepted as a valid character. It must be a dot.

TOP of page
Options

The following options can be set.

Change the Confidence Interval for all appropriate statistical procedures by choosing CI in the options menu and then ticking the level you want to select, or use the fill in box under the other options. The default setting of the confidence interval is 95%. Confidence intervals smaller than one percent or greater than 99.9998 percent are not allowed.

Change the size of the letters in the output field by selecting Output and Font Size. A box will pop up and ask you to give the size you want in points. 99 points is the maximum. The default font size is 10.

You can change the number of data that is output by the exact procedures by setting the number of lines. If you want to see more lines of data, set the number higher, lower for fewer lines. The default setting is 20 lines of data for the various exact procedures.

The precision of the exact Confidence Interval can be set. The lower the number the higher the precision. There seems to be a maximum precision of about 1E-15. You can try values below that number, but this may cause your computer to stall. According to the Delphi specification a much better precision should be possible but we havent yet sorted out why it doesnt work.

For descriptives. Tick observed if you want the observed number be the default, untick observed if you want the expected value to be the default. Tick simple for a few descriptives, untick for more.

This program does not have an INI file and settings are not remembered. You will have to set the options each time you use the program.

TOP of page
Help

Help is provided under the Manual button in the task bar. A webpage will download.

In most cases, activating a field, button or checkbox and using the F1 key provides context-sensitive help.

TOP of page
License

This is free software. It can be freely distributed and installed. This program is provided to you by Quantitative Skills Research and Statistical Consultancy.

Although this program has been tested extensively, no program is ever bug or error free, and you should always check your results carefully. This software and the accompanying files are sold "as is" and without warranties as to performance or merchantability or fitness for a particular purpose. The entire risk arising out of use or performance of the software remains with you.

Copyright: Quantitative Skills and Daan Uitenbroek PhD, 2006.

TOP of page
Limitations

This program has very few in-built limitations. For the free version the number of possitive observations (in the second box) is limited to 100. There are no other limitations for the free version. This does not mean that you can do calculations on a table with 1.000.000 cases (or so) and be sure of a valid result. Complex formulas are used which can only function within the limits of computers, of this program and of the compiler's (Delphi) capabilities. The behavior of the program in extreme situations is hard to predict. The program has been thoroughly tested using examples from statistics books and by comparing its results with the results of other statistical programs. It was found to function perfectly. These tests were done with 'normal' examples, of the type encountered in 99.9% of research situations. Various experiments have been done using very large numbers and the behavior of in-built proportions in the program in such situations was studied in detail. Generally the program performed well, responding to impossible situations with floating point or variable out-of-range errors and crashing, or producing either no result or the "NaN" result. Encountering such errors means that what you want to do is not possible. We found that these errors occur particularly frequently in cases in which you do not really need a statistics program to tell you that the p-value is very close to zero or one. Furthermore, sometimes many millions of calculations are required to produce the results of the exact procedures. In particular, procedures using an exact hypergeometric solution may take a long time.

Although this program has been tested extensively, no program is ever bug or error free, and you should always check your results carefully. You can check results by comparing different statistics: exact results with normal approximations, Hypergeometric with Binomial and Binomial with Poisson Distributions. However, even then there may be problems (though it is hard to imagine what they might be!) and no statistic can ever replace a healthy mind and a critical attitude in research. If you have a data set to analyze and you do not get the expected result, please report this to SISA. Valid bug reports are rewarded with a fixed version of the program and a good book. Good luck!

TOP of page
Confidence Interval

Most researchers and statisticians set the Confidence Interval for their project at 95%. This means that in 95% of research projects such as yours the sample mean will be within the stated interval. The alpha error is 5%, basically the probability that the mean will not be within the stated interval on the basis of chance. 5% is then the chance of a type I error, declaring that a difference exists, that a medicine is effective, or a change profitable, while in fact the apparent difference is due to the fact that the sample is unrepresentative.

The Confidence Interval can be changed using the submenu CI under the Options menu. Fixed options are for a 80, 90, 95 and 99% Confidence Intervals. You can define any other confidence interval by using the other option. Confidence intervals smaller than 1% or greater than 99.9998% are not accepted. The default setting for the confidence interval is 95%.

TOP of page
Edit Menu

The Edit menu is operational for the output field and for the input boxes. The usual short cuts to the clipboard can be used: Ctrl-C=Copy to clipboard; Ctrl-V=Paste from clipboard and Ctrl-X=Cut and copy to clipboard. Ctrl-Z=Undo only works for the output field.

TOP of page
Exact Check Box

Checking the exact box will produce a table of exact probabilities for the Poisson, Binomial or Hypergeometric distribution. The program will determine which distribution to use on the basis of the input provided. The Poisson distribution will be chosen if only the top two boxes contain values, the Binomial if the top three boxes contain values and the Hypergeometric if all four boxes contain values. If the negative binomial box is also checked the Negative Binomial is given instead of the Poisson or the Binomial. Which one is used depends on the input. If the all box is checked, data is shown for all possible distributions given the input.

TOP of page
Approximation Check Box

Checking the approximation box will produce approximate confidence intervals, probability values based on normal approximations, and various other approximate statistics for the Poisson, Binomial or Hypergeometric distribution. The program will determine which distribution to use on the basis of the input provided. The Poisson distribution will be chosen if only the top two boxes contain values, the Binomial if the top three boxes contain values and the Hypergeometric if all four boxes contain values. If the negative binomial box is also checked the Negative Binomial is given instead of the Poisson or the Binomial. Which one is used depends on the input. If the all box is checked, data is shown for all possible distributions given the input.

TOP of page
Expectation Value

The expectation value for the Binomial, Hypergeometric or Negative Binomial/Polya Distribution will in most cases be a proportion, a value between zero and one. This value gives the proportion of individuals in the population who have a certain characteristic, i.e. being a woman, a love for opera, being above a certain age etc. The expectation value is a theoretical, historical or philosophical value; you know this value for certain and without any doubt on the basis of experience or a theoretical idea. For the Hypergeometric or Binomial distribution an average higher than one can also be given, i.e. the expected number. The program will recalculate this value into a proportional value by dividing it by the sample size. For the Hypergeometric there is a preference for the result of this division to be an integer value. If this is not the case, the program will suggest a correction.

For the Poisson or the Negative Binomial for rare events the situation is different. Here an average is given: the average number of events you expect to happen within a certain period of time and given a certain domain. This average is always treated as an average, even if it is between zero and one in value.

Lastly, if you estimate an exact confidence interval you do not have to give an expectation value. There is one exception: if you estimate a hypergeometrically distributed population size for a Catch-Recatch study, you have to give the expected number in the population. This expected number would always have a relatively high value, at least as large as the observed number in the second box. You do not have to give a population size in this case.

TOP of page
Observed Number

The observed number is the only value that you must give; the program will not work without it. The observed number is the number found in your sample who have a certain characteristic, i.e. being a woman, an opera lover, being above a certain age etc. In most procedures the observation value is compared with the expectation value. In the case of the exact confidence interval procedures, this comparison between expectation and observation does not take place. If you give only an observed number and no sample size you can do things which have to do with the Poisson Distribution. To do a Binomial analysis you also have to give a sample size. For the exact procedures the observed number will usually be treated as an integer, non-decimal, value. Decimal numbers are truncated to the largest integer smaller than the real number given. For the approximate procedures, the observed number is treated as a real number if such a number is given.

TOP of page
Sample Size

The sample size is given if you want to keep the observed number constant for the number of observations done. If you give an observed number and a sample size you can do things which have to do with the Binomial Distribution. For the exact procedures the sample size will usually be treated as an integer, non-decimal, value. Decimal numbers are truncated to the largest integer smaller than the real number given. For the approximate procedures, the observed number is treated as a real number if such a number is given.

TOP of page
Population Size

The population size is only relevant if you want to do a Hypergeometric analysis. For the exact procedures the population size will usually be treated as an integer, non-decimal, value. Decimal numbers are truncated to the largest integer smaller than the real number given. For the approximate procedures the population size is treated as a real number if such a number is given.

You do not have to give a population size if you want to estimate a hypergeometrically distributed population size for a Catch-Recatch study.

TOP of page
Variance Covariance check box

Give a variance in this box for the Negative Binomial for rare events. Give a covariance in this box for the Fieller analysis. The Fieller analysis is part of the Rate-Ratio analysis, which is done if the approximation box for a Poisson analysis is checked.

The content of this box is ignored for all other analyses.

TOP of page
All Check Box

All analyses are done which can be done given the data and which other check boxes have been checked.

TOP of page
Descriptives

Checking the descriptives box will produce descriptive statistics for the Poisson, Binomial or Hypergeometric distribution. The program will determine which distribution to use on the basis of the input provided. The Poisson distribution will be chosen if only the top two boxes contain values, the Binomial if the top three boxes contain values and the Hypergeometric if all four boxes contain values. If the negative binomial box is also checked the Negative Binomial are given instead the Binomial. If the all box is checked, data is shown for all possible distributions given the input.

Descriptives presented depend on the selected distribution. Possible descriptives are the mean, the variance, the skewness and the kurtosis of distributions. The statistics are given by default for the observed distribution, in which case you do not have to give an expectation value. Under the options menu you can select using the expected instead of the observed distribution.

TOP of page
Negative Binomial/CRC Check box

Checking this box results in:

* A negative binomial analysis for rare events if there is a Poisson type input. An expectation value and an observation value are given

* A negative binomial/Polya analysis for the likelihood of a particular sample size given an observed number, and an expected proportion if there is a Binomial type input. A proportion expected, number observed and a sample size are given. For an exact Negative Binomial confidence interval for the sample size (the value in the third box) based on the number of positive responses observed also check the C.I. box. An approximate estimate for the confidence interval for the sample size is produced if the estimate box is also checked. For the estimation procedures you can give an expected proportion for the likelihood of a positive observation on each trial; this proportion would normally be estimated from the data but an extraneous expectation can also be considered.

* A Catch-Recatch analysis for the hypergeometrically distributed population size. A number expected, number observed and a sample size are given. The sample size and the number expected are both larger than the number observed. You need to also check the CI box for an exact estimate or the approximation box for an approximate estimate.

TOP of page
Output Field

In the output field the output of the analysis is presented. The statistics in the output are related to the first current data above. Each time you change between procedures or change the input new current data is printed in the output field.

The Output Field is fully editable. The Edit menu in the task bar is operational for this field and the usual short cuts to the clipboard can be used: Ctrl-C=Copy to clipboard; Ctrl-V=Paste from clipboard; Ctrl-X=copy & cut, Ctrl-Z=Undo.

The content of the output field can be printed and/or saved as a text file using the file menu in the task bar.

The size of the lettering in the output field can be set under Options.

TOP of page
Copyright

Copyright: Quantitative Skills and Daan Uitenbroek PhD, 2008.
TOP of page
Warranty
Although this program has been tested extensively, no program is ever bug or error free, and you should always check your results carefully. This software is provided "as is" and without warranties as to performance or merchantability or fitness for a particular purpose. The entire risk arising out of use or performance of the software remains with you.
TOP of page
Download
Download the program here by double clicking this link and saving the program to a directory of your choice.
TOP of page

Compare Car Rentals!
Help SISA and compare two rental cars!
An easy way to find the best option.
www.quantitativeskills.com

Distributions