Simple Interactive Statistical Analysis

Raking

Input.

Input a table of data totals in the boxes. These are observed integer numbers. In the marginal blue bordered boxes you input the population distribution to which the table of data must be raked. Here you can input any set of numbers, percentages or fractions, you only have to ensure that the distribution of these numbers is equivalent to the population distribution. If you want your weights to result in the weighted data adding up to a specific total, give this total in the green bordered box. The default is the sample total.

Explanation.

Weighting is a technique whereby the data structure is made similar to the population structure to obtain estimates which are unbiased and representative for the population one wants to make inferences about. The sample is usually made comparable to the population only on demographic indicators of which detailed population data is available, such as age, gender, ethnicity, and income. If available other indicators can also be used. Usually cell weighting is used, all possible combinations of the weighting variables are considered in developing weights, for example, age in five year categories is considered separate for both sexes and this is done again separate for a number of social classes. This is a relatively straightforward procedure which generally produces good results. However, it requires a relatively large sample, there must be a reasonable number of people in the sample in each sub cell, and it is also required that population data is available for each cell. For example, if there is one data set with the population age distribution, and another dataset with the distribution of the population over a number social categories, cell weighing can mostly not be used, because there is no information, for example, on how the social classes are distributed in the youngest or the oldest age group.

A disadvantage of weighting generally is that it reduces the precision of estimates, which translates into an increase in the variance of estimates. Confidence intervals will be wider after weighting, differences will have to be larger before statistical tests are significant. This “design effect” caused by weighting can be considerable. Mathematically the design effect is the factor by which the variance is increased due to weighting, thus: weighted variance=observed unweighted variance*design effect. The design factor is the factor by which the standard error, and thereby the confidence interval (CI), is increased due to weighing, thus: correct CI=observed CI*design factor. The effective N gives you an insight in the decrease in power due to weighting.

Raking is an alternative to cell weighting. Using this technique the marginal distribution on two sets of variables is in a stepwise procedure made equal between the sample and the population. The cell weights are then estimated as the differences between the observed joint table distribution before raking and the table distribution under the model of independence after raking. There are three advantages of raking. First, less knowledge is required about the population, one does not have to know the inside of the table; second, raking has the advantage that it can be used on smaller samples; and third, generally there will be less variance in a raked set of weights, the design effect implicit in a raked set of weights will be smaller compared to the design effect implicit in a set of weights which are obtained by cell weighing. The results of raking will mostly be comparable with the results of other alternatives to cell weighting. The price to pay for raking is that information on the joint distribution of the variables in the population is ignored, which can be a source of bias, making structural errors in estimating population parameters. However, in weighting one has to balance the bias which is introduced by weighting less, as in the case of raking, against the loss of precision which is introduced by weighting more, as in the case of cell weighing.

Graham Kalton & Ismael Flores-Cervantes. Weighting Methods. Journal of Official Statistics 2003;19:81-97. ->JOS

Leslie Kish. Methods for Design Effects. Journal of Official Statistics 1995;11:55-77. ->JOS

Leslie Kish. Weighting for Unequal P_i. Journal of Official Statistics 1992;8:183-200. ->JOS

TOP of page

Go to procedure

Simple Interactive Statistical Analysis

Raking

Input.

Explanation.

All software and text copyright by SISA