to: Index and Menu

Simple Interactive Statistical Analysis

Go to procedure



This procedure is to calculate the sample size required for estimating a prevalence, incidence or other proportion like estimator. It can be used for simple random designs but also to estimate the sample size required for a two stage cluster design.

For estimating the confidence interval around a prevalence use the One Mean procedure .

For comparing two prevalences use the T-test procedure.


Place values in the boxes: "Expected Proportion" and "Width of CI" should be proportions, values between zero and one; "Number of Clusters" and "Cases per Cluster" should be integer numbers; "Design Effect" should be a real number which is one or larger and smaller than the total number of cases divided by the number of clusters. You should make "Width of CI", or "Number of Clusters" or "Cases per Cluster" the value zero and a solution will be given for the zero parameter given the values in the other four boxes


For estimating the confidence interval for the binomial proportion without a two stage design. Give a value for "Expected Proportion" and "Width of CI" (both should be proportions, values between zero and one) and leave all the other boxes the value zero for estimating the required number of cases to construct a confidence interval with a particular width. Or give the expected prevalence/proportion and the number of cases in the "Cases per Cluster" box for the resulting width of the confidence interval.


For the simple binomial case give the expected proportion in the top box. Give the width of the confidence interval you want to have in the bottom box. For example, give 0.035 in the first box and 0.01 in the bottom box if you want the resulting confidence interval in a study to be between 0.03 and 0.04. The procedure doesn't require a power, or 1-beta, parameter. Basically the result is that if you use the advised number of cases the resultant parameter will have a confidence interval with a, say, 95% CI range from -0.5*width to +0.5*width, with the parameter value in the middle.

In the two stage design you consider the case in which you first sample clusters, for example, a number of schools are selected randomly, following which you sample "units", a number of pupils are sampled in each school. You should consider the design effect, which is related to the intra-correlation coefficient. Intra-correlation exists when the unit characteristics are dependent to a certain extent on cluster characteristics. This would be the case if, for example, some schools are particularly good, and others particularly bad, at the trait measured with the dependent variable at the level of the pupil. The design effect is defined to be equal to the variance considering a two-stage design divided by the simple random sample variance. In the case of already existing data the design effect can be estimated using a bootstrap resampling procedure or an intra-correlation & variance correction estimation procedure. A further discussion of the issues for these techniques can be found on an article page. The design effect parameter for already existing data can be estimated using the intra-correlation spreadsheet for the single cluster.

In the case of estimating the number of cases or the number of clusters for sample size calculation, parameters such as the prevalence, the expected proportion and the design effect will have to be estimated on the basis of previous experience. Please note that the design parameter should be larger than one, i.e. the design parameter is one if the number of cases in each cluster equals one, and that the design effect parameter should be no larger than the total number of cases divided by the number of clusters. The number of cases in each cluster should be equal, if that is not the case select a number which is quite close to the minimum number you expect in the clusters (thus: Design effect=N/c whereby N=Mean(small-n)*c). Usually one would expect the design parameter to be relatively close to one in between the limits 1 and N/c.

For the two stage design the program will solve either the number of cases; or the number of clusters; or the width of the resulting confidence interval, considering the values in the other four boxes. The parameter you want to solve should be given the value zero, "00.00" in the input boxes.

Further Reading.

Machin D, Campbell M, Fayers P, Pinol A. Sample size tables for clinical studies, 2nd Edition. London, Edinburgh, Malden and Carlton: Blackwell Science 1997.

Kumar R, Indrayan A. A nomogram for single-stage cluster-sample surveys in a community for estimation of a prevalence rate. Int J Epidemiol 2002 Apr;31(2):463-7. ->Medline

Limburg H, Kumar R, Indrayan A, Sundaram KR. Rapid assessment of prevalence of cataract blindness at district level. Int J Epidemiol 1997;26(5):1049-54. ->Medline

Please note that this procedure in SISA follows the notation of Machin and Campbell and not the one by Limburg et al and/or Kumar and Indrayan. This is particularly important regarding the so-called "precision" parameter L, which in SISA is defined as the width of the resulting confidence interval.

TOP of page

Go to procedure

to: Index and Menu

All software and text copyright by SISA