SISA Research Paper

Design, data weighing and designeffects in Dutch regional health surveys

Previously published in Dutch as: Uitenbroek DG. Design, wegen en het designeffect in GGD gezondheidsenquêtes. Tijdschrift voor Gezondheidswetenschappen (TSG). 2009(2): 64-8.

Health surveys carried out by regional public health authorities in the Netherlands frequently use a stratified design (Ten Brinke and Verhagen, 2003; GGD Hollands Midden, 2006; Heemskerk and Poort, 2007; Acker, 2005). The population in the health authorities working area is divided into groups, and in each group a pre-determined number of individuals is surveyed (Cochran, 1977). One of the reasons for this design is that health authorities often want to be able to compare local authority areas within their working area with about equal numbers of cases collected in each local authority area. Stratified sampling designs which are developed with the combined aim of providing reliable statistics at both the local level and the overall regional level are common internationally. Other examples of stratified designs in health surveys can also be found, for example the Amsterdam health survey (Uitenbroek,, 2006) was stratified by age and ethnicity, to enable the study of health service needs among older minority groups compared with the needs in the majority ethnic Dutch population, while simultaneously providing information on health in the city of Amsterdam in general.

To provide statistics for the full health authority area the data has to be weighted, to consider differences in population size and sampling fraction between the different strata used in the design. If the data is weighted the reliability of statistics and estimates produced on the health authority level will be less compared with unweighted data, given a similar number of cases, which translates in wider confidence intervals and differences between groups being less easily significant in weighted data (Kish, 1995). Although weighing reduces the reliability of statistics, non-weighing is often not an option, as statistics in data collected with a stratified design can be seriously biased compared with a simple random sample (Kish, 1957).

In analyzing weighted data therefore the decrease in reliability due to the weighing must be considered. In basic statistical computer packages this is mostly not done. However, more complex methods are nowadays available in a number of dedicated computer applications. In this paper the cell weighing procedure is discussed with specific attention for the designeffect caused by this procedure, whereby the designeffect is the statistic which measures the change in reliability which is caused (among other things) by data weighing. A number of simple formulas are introduced and the extent of the designeffect in Dutch regional health surveys is studied.


For this article a secondary analysis was done of health authority reports about local health surveys. The health authorities were asked for additional information when required. The weighing in this paper is done according to the cell weigh procedure, as described by Kalton and Flores-Cervantes (2003), the formulas by Kish (1992) are used to estimate the designeffect for the mean or average.


Purpose of most weighing is to restore in the sample on a number of (social-demographic) variables the same distribution as observed in the population. There are several methods (Kalton & Flores-Cervantes, 2003) to do this, in this paper two of these methods are used which are both based on the cell weigh procedure. Both procedures discussed here produce the same result in the weighted sample, however, the weights produced by the two methods can be differently interpreted. In the first method used in this paper are the weights (Wi) for each of k stratums the reciprocal of the sample fraction. The sample fraction is the number of sampled individuals ni in each k strata divided by the number of people Ni in the same strata in the population. Thus:

wi =1/(sample fraction)=1/( ni / Ni)= Ni / ni (1)

After weighing according to this method the sum of all individual weights in the sample is equal to the total population size: Σ (Wi*ni) = N+. The weights Wi can be interpreted as the number of individuals in the population each individual respondent in the i-th strata represents. A practical advantage of using these weights is that many statistical programmes for complex designs use these weights as the basis of calculations.

A second method is to divide the proportion of each strata in the population by the same proportions as observed in the sample, thus:

wi = Pi / pi (2)

The sum of all individual weights for these weights is equal to the sample size: Σ (wi*ni) = n+. These weights give the multiplication factor with which groups in the sample become more or less important because of the weighting. The weights also give an impression of the effect of weighing on the designeffect, weights which are (much) larger than one will particularly increase the designeffect. For this reason weights are often trimmed when they are above a certain value (Potter, 1999), introducing some bias in the process.


Because of the weighing the reliability of the sample will decrease, the real reliability will be lower as the number of cases collected suggests. The designeffect –DEFF- measures this and is defined as the factor by which the variance calculated under the assumption of a simple random sample (SRS) changes:

Variance after weighing (v^) = variance SRS (v) * DEFF(3)

The designeffect is sometimes also defined as the factor by which the observed number of cases changes because of the designeffect:

Effective n^ = observed n / DEFF (4)

The DEFF in formulae 3 is equivalent to the DEFF in formulae 4. As the designeffect is almost always larger than one the formulas result in the sample variance will increase because of the designeffects, and the effective number of cases decreasing.

In the case of a data file in which the weights are included as a separate variable with one weight for each respondent, formula 5 can be used to calculate the designeffect for a sample mean:

DEFF = n Σ wj2 / (Σ wj ) 2 , whereby wj is the weight of the jth respondent out of n respondents; n is the sample size. (5)

If totals for the strata in the sample and population are available the following formula can be used for calculating the designeffect for the sample mean. This way of calculating the designeffect is particularly practical in the design phase of a study before data collection, for example to consider designeffects in calculating sample sizes.

DEFF = Σ (Ni2/ ni) * n/N2, Whereby Ni and ni are the totals for each of the “i” strata in the population and sample respectively (6)

Besides the designeffect there is the design factor (DEFT). The design factor is defined as the square root out of the designeffect:

Designfactor (DEFT)=√ DEFF. (7)

The design factor is the factor by which the standard error of estimates changes due to the sample design. It also gives the multiplication factor by which the confidence interval around an estimate changes due to the sample design.

The designeffect can be used to correct the t-test and the f-test for weighted data (Hahs-Vaughn, 2005).

Example and results.

Table 1 gives an overview of the health survey done by the Amsterdam the Meerlanden regional public health authority (Ten Brinke and Verhagen, 2003 & 2004). The design of this survey was to take a sample of 750 individuals from the smaller local authorities, and a sample of 1500 from the larger authorities. In total this resulted in a sample of 5250 persons.

Table 1. Design for the health survey of the Amstelland de Meerlanden public health authority, 2002. Calculation of sample weights, designeffect, and the effect of weighing and design on the estimation of the percentage of citizens reporting noise disturbance due to overflying airplanes.
size of design
Ni population size Ni * Ni / ni mi
wi $ Number in sample experiencing airplane noise Estimated number in population who experience noise

























Ouder amstel























Designeffect DEFF = Σ (Ni2/ ni) * n/N2 = 8074030 * 5250 / (187367 *187367) = 1.21

95% CI unweighted = 16.2 ± 1.96 * √ (p(1-p )/m) = 16.2 ± 1.96 * √ (0.162(1-0.162 )/3264)*100=16.2 ± 1.26

95% CI weighted = 15.4 ± 1.96 * √ (p(1-p )/m*DEFF) = 15.4 ± 1.96 * √ (0.154 (1-0.154)/3264*1.21)*100=15.4 ± 1.36

$ According to formulae 1.
This table is based on table 2.1 from Ten Brinke JM., Verhagen CE. Hoe gezond is de regio? health peiling 2002; and table 5.3 from: Hoe gezond is de regio? Supplement. Health peiling 2002. Both: Amstelveen: GGD Amstelland de Meerlanden (Ten Brinke and Verhagen, 2003 & 2004).

The designeffect of the Amstelland de Meerlanden public health authority survey is estimated to be 1.21. The effective N for calculating the variance and confidence interval around a mean for this design equals 5250/1.21=4338. In 2002 the health authority collected data from 3264 respondents, which results in an effective n for analysis of 3264/1.21=2698 respondents. The design factor is the square root from the designeffect, thus √1.21=1.1. The confidence interval around a mean is therefore after weighing about 10% wider compared with a simple random sample confidence interval.

The health authority area is near Schiphol-Amsterdam international airport and airplane noise is an important problem in the area. Data on airplane noise is used here to demonstrate the calculation to determine the designeffect of the study. The table shows in the fifth column the number of respondents reporting disturbance due to airplane noise. On the basis of the unweighted data about 16.2% (528/3264*100) of citizens in the area will be disturbed due to airplane noise, with a 95% confidence interval calculated according to a basic method (Blalock, 1960) ranging from 14.9 to 17.4%. Using the sample weights in the 6th column the numbers of respondents are recalculated into the estimated number of people disturbed by noise in the population. The result can be used to estimate the weighted percentage in the overall health authority area which is disturbed by airplane noise, this is calculated to be 15.4% (28982/187367*100), with a confidence interval ranging from 14.0 tot 16.8%. The weighted percentage is lower compared with the unweigthed percentage because in the larger local authority areas, which have larger weights, there is generally less airplane noise compare with the smaller areas.

Table 2 Examples of health survey designs as done by Dutch regional health authorities.


Approximate Design

Wi, range *



Health monitor Zuid Holland Zuid, 2006

age 19+, 4% from 14 municipalities



Terpstra, 2006

Health profile Groningen, 2006.

age 20+, 2% from 25 municipalities



Broer and Spijkers, 2006

Health profile Groningen, 2002.

In the age group 20-64 1% from 21 authorities, 2% from 4 authorities; older as 65+ 2% from 22 authorities, 4% from 2 authorities 5% from 1 authority.



Broer and Spijkers, 2002

Amstelland de Meerlanden, 2002, health monitor

See table 1



Ten Brinke and Verhagen, 2003

Noord Kennemerland, health momitor, 2006.

About 480 respondents each from 8 local authorities, age 19-65 year



Heemskerk and Poort, 2007

Gooi and Vechtstreek. Health monitor, 2004.

About 1500 respondents each from 9 local authorities, age 19+



Acker, 2005

Health survey GGD Hollands Midden, 2005.

About 500 respondents each from 13 local authorities, age 19-65 year



GGD Hollands Midden, 2006

Amsterdamse health monitor, 2004.

About 200 each from 5 age and 4 ethnicity groups, 20 groups in total, age 18+



Uitenbroek, 2006

* According to formulae 2

Table 2 gives the designeffect in the health surveys as they are published by a number of health authorities in the Netherlands. In two health surveys (Terpstra, 2006; Broer and Spijkers, 2006) there is no designeffect, as it concerns self weighing designs, fixed proportions from each of the strata. Weighing is not required and therefore there will be no designeffect due to the weighing. In the health survey from Groningen done in 2002 (Broer & Spijkers, 2002) and Amstelland de Meerlanden, also done in 2002 (Ten Brinke & Verhagen, 2003), higher numbers of cases were selected from the larger local authorities. The designeffect for these studies is 1.14 and 1.21 respectively. The designeffect is larger in those surveys were a fixed number of cases is taken from strata were the population is different in size. The designeffect ranges from 1.71 in the health survey from Noord Kennemerland from 2006 (Heemskerk & Poort, 2007) to 1.85 in the Amsterdams health survey from 2004 (Uitenbroek, 2006).


In this article attention is given to the design, data weighing and designeffects in health surveys as done by regional public health authorities in the Netherlands. Mostly stratified designs are used whereby the population in the region is divided into groups and from each group in the region pre-determined numbers of cases are sampled. A number of the designs are self weighing, in these cases there is no designeffect because of weighing as the data does not have to be weighted. Largest designeffects could be observed were fixed numbers of cases were taken from strata were the population was different in size. In many health surveys the designeffect will not cause all too serious problems, as these survey tend to be very large in size. However, when surveys are smaller there can be a problem, and also in the case of subgroup analysis were there is weighing to make the sampled sub-group representative for the same subgroup in the population, the design effect needs to be considered.

Careful planning of the design is therefore important. By making changes to the number of individuals sampled in different groups all too strong designeffects might be prevented. Attention must be given to the possible intention to study particular groups, if these groups have to be made representative of the same groups in the population. Combining designeffect calculations as suggested in this paper with sample size calculations seems required.

In this paper a relatively basic method is used to calculate the designeffect, and the method is valid to estimate the designeffect due to weighting for the variance of a simple mean or average. Designeffects for the variance of other estimators and for measures of correlation between two variables need to be calculated in other ways. In this case it seems best to use one of the dedicated packages to do the calculations such as SPSS complex samples (, epi info complex samples (, the module “survey” in “R” (, Wesvar ( or AM ( Some of these packages are freely available. These packages are particularly useful when estimating the critical p-values for statistical tests and can be used to correctly do a weighted multivariate analysis. Given that these packages are available there is no excuse to simply use the “weight” commando in a general statistical package without considering the designeffect. The confidence interval around a mean or average calculated by the method suggested in this paper is mostly very similar compared with the confidence interval calculated using one of the statistical packages mentioned above.


Acker MB. Health peiling 2004: van de inwoners van 19 jaar and ouder uit de regio Gooi and Vechtstreek. Bussum: GGD Gooi and Vechtstreek, 2005.

Blalock HM. Social Statistics. New York: McGraw-Hill.1960.

Broer J,Spijkers E. Local health profile Groningen 2002. Groningen: GGD Groningen, 2002.

Broer J,Spijkers E. Local health profile Groningen 2006. Groningen: GGD Groningen, 2006.

Cochran WG: Sampling Techniques, 3rd Edition. John Wiley, 1977

GGD Hollands Midden. Health enquête 19-65 jaar 2005: Factsheet 1 Onderzoeksopzet and achtergrond kenmerken. Leiden: GGD Hollands Midden, 2006.

Hahs-Vaughn DL. A primer for using and understanding weights with national datasets. The Journal of experimental education. 2005, 73: 221-48.

Heemskerk M, Poort E. Health peiling Volwassenen 2006. Schagen: GGD Hollands Noorden, 2007.

Kalton G, Flores-Cervantes I. Weighting Methods. J Off Stat 2003;19:81-97.

Kish L. Methods for Design Effects. J Off Stat 1995;11:55-77.

Kish L. Weighting for Unequal Pi. J Off Statistics 1992;8:183-200.

Kish, L. (1957). Confidence intervals for clustered samples. Amer. Soc. Rev. 22, 154-165.

Potter F. A study of procedures to identify and trim extreme sample weights. Proceedings of the Survey Research Methods Section, Am Stat Assoc 1990; 225-230. (

Sturgis P. Analysing Complex Survey Data: Clustering, Stratification and Weights. 2004. (

Ten Brinke JM, Verhagen CE. Hoe gezond is de regio? health peiling 2002. Amstelveen: GGD Amstelland de Meerlanden, 2003.

Ten Brinke JM., Verhagen CE. Hoe gezond is de regio? Supplement. health peiling 2002. Amstelveen: GGD Amstelland de Meerlanden, 2004.

Terpstra JS, Sanavro FL, Leeuwenburg J. Health monitor 2006.Dordrecht: GGD Zuid-Holland Zuid, 2006.

Uitenbroek DG, Ujcic-Voortman J, Janssen A, Tichelman, P, Verhoeff AP (Red). Gezond Zijn and Gezond Leven in Amsterdam: Amsterdamse health monitor 2004. Amsterdam: Afdeling Epidemiologie, Documentatie and health bevordering, GG&GD, 2006.

TOP of page

Compare Car Rentals!
Help SISA and compare two rental cars!
An easy way to find the best option.

SISA Research Paper