SISA Research Paper

Is it correct that IPW weights (Wi=Ni/ni) are better compared with the EPW weights (wi=Pi/pi)?

The IPW (inverse probability weight) weights are based on inversing the probability of respondents being included into the sample and the EPW (equal proportion weight) weights are based on making the sample proportions similar to the population proportions regarding groups of respondents by age, gender, ethnicity etcetera. When applied to the data both sets of weights produce similar results in terms of weighted proportions, weighted means and other basic statistics. However, the two sorts of weights tell a different story. The IPW weights are mostly more easy to understand and if you have some experience also more intuitive. The IPW weights tell you how many persons in a particular group in the population a single respondent from that group represents. This number is always the same; therefore if you compare different towns with each other, or different areas within the towns, or look at larger regions, you always use the same weights. The EPW weights give the importance of a weighted respondent compared with un-weighted respondents; sometimes a respondent becomes less important after weighing, other times more important. If the weight is larger than one the respondents are more important after weighing, if the weight is smaller than one they are less important. An important disadvantage of the EPW weights is that these change if you change the level of analysis. Therefore for an analysis on the regional level you need other weights than for an analysis on the city level and for areas within the city you need yet other weights. Also if you want to do the analysis for men or women separately instead of for both groups together, or for different age groups, every time you need a new set of weights. Finally, SPSS Complex Samples, EpiInfo Complex Samples, Survey in "R" and other complex samples programs mostly presume that you use the IPW weights.

We have calculated the Wi and that does not seem to go well, we have a lot of non-response and many cells have only a few or zero cases

You will have to combine age groups or regions. That you can do rather arbitrarily, there where you have less than say 5 respondents in a stratum you combine two age groups or (neighbouring) regions. If you take age groups or regions together depends on the effects you expect in the outcome variables. Is the expected difference in the outcome variable between the age groups large then you combine regions, is the difference between the regions large then you combine age groups. But do remember, the more categories you use in the weighing, the smaller the potential bias in your outcome variables, therefore do not combine too many categories. Alternatively, other methods of making weights can be considered, such as raked weights or weights made using regression methods (Kalton & Flores-Cervantes, 2003).

When do we trim weights?

The literature mentions various methods and cut-off points at which to trim weights. Theoretically, a weight should be trimmed at the point were the loss of precision due to a large weight is larger than the bias introduced by trimming the weight (Potter, 1999). A fixed cut off point is sometimes used, and various values each with their own rational are suggested. According to a directive by the Dutch RIVM you trim EPW weights which are larger than 5 and raise weights which are smaller than 0.2. Others suggest a procedure were all weights which are larger than 3 to 6 times the average weight, or median weight (as the average weight is highly influenced by outlier weights), are trimmed. Yet others suggest trimming the 3 or 5 largest weights to the value of the 4th or 6th largest weight respectively. It is more important to trim large weights than to raise small weights. After trimming the weights you can see what effect this has in terms of reduced design effects, and compare the (partly) weighted with the un-weighted proportions, means and other estimates.

If you use the EPW weights as the basis for trimming what happens to the IPW weights?

If you use the SISA weight application the IPW weights are adapted to follow the trimmed EPW weights. A disadvantage of this is that the IPW weights are no longer invariant concerning the level of analysis; you can therefore not apply them in different subgroups without making some kind of correction. However, the bias introduced by this invariance will generally be small.

In one of our local councils we have a larger sample in two districts: do you treat these two districts as independent samples, and split the local council concerned as if it were two local councils? And then compare the population of the oversampled district with the population of the rest of the council (minus the districts)?

Yes, it is best to treat the local council in this case as two local councils. Perhaps that you can weigh for all districts of the council, particularly if you expect a different response by district that might be worth it. However, this depends a bit on your sample size. You must have sufficient numbers of cases to fill the weighing table cells.

And if in the above case you want to say something about the whole local council, can you take the two populations together and do the analyses using the same weights which you have made earlier, or do you have to make new weights?

If you used the IPW weights you do not have to make new weights. In case of the EPW weights, however, you have to make new weights by district for the whole council.

We collected data with both a mail survey and by means of an internet survey. Can you add these data sets?

How you collected the data does not matter much, as long as both are random selections from the population, or one selection were people either answer the mail survey or the internet survey. After you have added the data of the two surveys you compare the profile of your sample with the profile of the population and then calculate the weights. What you must consider is the possibility that respondents had a double chance to enter the sample, when they could have been selected both for the mail survey and for the internet survey, while other respondents did not have this double chance. Although it is in possible to take account of this in developing the weights it is better to prevent this from happening.

Is the value of the design effect important? Generally this value is between 1 and 3. Sometimes however we see a higher value. Are there guidelines as between what values the design effect should be? Are the results of the analysis reliable if the design factor is large?

As the design effect or design factor increases. the data becomes less reliable, as a result the variance becomes larger, statistical test become less easily significant and confidence intervals become broader. Complex Samples analysis programs such as in SPSS in EpiInfo and in "R" make all the corrections automatically. There is no upper limit to the design effect, if the design effect increases the data becomes very unreliable and there is very little you can say about the data. A problem is that the probability of an error type II increases, this is the error that you do not declare a difference statistically significant while in reality it is. More effective treatments in medicine, better methods in education or in manufacturing might be missed due to unreliable data. If a large variance become a problem because of design issues you can try to reduce the design effect by weighing differently, for example by adding groups together, using different indicators for the weighing, or by trimming large weights.

In our area we find a difference between men and women and between three different age groups. Can we use Chi-square tests on the weighted file to see if the difference is significant or do we have to use special complex samples soft ware for this?

No you cannot use ordinary crosstabs with Chi-square tests after weighting without taking the design effect into account. Your statistical tests would become significant too easily. You have to use specialist complex samples software. Once you have set up this software you will see that it works quite easily and will be an improvement on basic crosstabs software.

We have used the IPW weights and now get very large numbers in the tables and the standard errors and variance are also very large. Is that correct?

Yes, the IPW weights relate to the size of the complete population, and not, like the EPW weights, to the sample size. In the tables you get an estimate of the total number of smokers, or some other relevant indicator, in the population. In the margins of the table you see the (sub) population size. In the case of the EPW weights you see the weighted number in the (sub) sample. Variance and standard error for the IPW weights also relate to the population, so it is the number smoking plus or minus a couple of thousand, or a couple 10000 people in the case of a local authority, or even a couple of million if the data concerns a country.

Kalton G, Flores-Cervantes I. Weighting Methods. J Off Stat 2003;19:81-97.

Potter F. A study of procedures to identify and trim extreme sample weights. Proceedings of the Survey Research Methods Section, Am Stat Assoc 1990; 225-230. (

Please mail questions about data weighing to:

TOP of page

Compare Car Rentals!
Help SISA and compare two rental cars!
An easy way to find the best option.

Survey data weighing questions and answers