**Simple Interactive Statistical Analysis**

Input.

Put the observed number of events in the top box and the expected number in the second box. The observed number should be an integer; the expected number can be real or integer. Give a covariance in the third box in case there is covariance between the observed and the expected. In that case you get an adjusted Fieller estimate of the confidence interval.

There is a free spreadsheet to do an SMR/CMF analysis on population data available here.

There is an inexpensive Windows program for Demographic and Epidemiological analysis, including the SMR, available here. Download the free demonstration version available on that page!

Explanation.

This module was inspired by two papers, one by Liddell and one by Silcocks. The purpose of the procedure is to estimate the confidence interval for a rate ratio. In practice, this will often be the Standardized Mortality Ratio, the Standardised Morbidity Ratio (SMR) or the Comparative Mortality Figure (CMF). The rate ratio is most suited to study events in a constant domain while the denominator -i.e. the population * at risk*- is very large. For those who work with events in smaller samples this module is less interesting and the risk ratio or odds ratio will probably be more appropriate. The risk ratio and the odds ratio are implemented on the SISA website by the procedures t-test and two by two tables, and in the SISA-tables program.

In making comparisons, our instinct is to use subtraction. For example, in comparing the numbers of pairs of breeding rare birds in different years, we tend to say that there are 20 more pairs this year than last year. It is a "good" year. The advantage of doing it this way is that it is easy to comprehend: it is easy to picture the pairs of breeding birds. The disadvantage is that it ignores the element of scale; 20 more pairs while the usual number is 500 is quite a different matter from having 20 more pairs while the usual number is 50. That is why we prefer to use ratios, or division. We take the number of birds we have observed this year and compare it with the number we had expected to observe on the grounds of previous experience. Thus, if we expected 500 breeding pairs, and have in fact observed 520 pairs this year, then we have an increase of 4%, 520/500=1.04 (*100). However, if we expected 50 breeding pairs, and have in fact observed 70 pairs, the increase is about 40%, 70/50=1.40. The next step is to use the generalizing statistical approach. This year, the number of breeding pairs of rare birds is 40% greater than the average in "ordinary" years, within a certain margin of confidence. The confidence interval given in the output gives you an impression of the precision of the estimate. If the value 1 (one) is __not__ included in the confidence interval, the result is said to be statistically significant. There is a difference between the number of birds this year and last year, and the difference is not solely due to chance fluctuation.

The module presents various approximations for the confidence interval for the rate ratio. First, the exact confidence interval for the rate ratio by way of a Chi-square transformation of the Poisson is given. Liddell discusses this method. The way this is implemented results in an exact estimation of the Poisson confidence interval of about four significant digits precision. If the number of events exceeds 80, precision will decrease rapidly and it might be a better idea to use the Poisson process approximation, also discussed by Liddell. The program issues a warning when this seems advisable.

Silcocks has pointed out that an important assumption, namely that the expectation is theoretical and error free, is not valid when calculating the SMR in epidemiology and demography. The age specific rates in the standard population on which the calculation of the expectation is based are empirical observations, which will show random fluctuation. Also, in the breeding bird example, it is correct to use the Poisson to compare this years number of breeding birds with the "usual" number, a theoretical concept. However, it is not correct to make the same comparison between two years, using last year"s observation as the expectation. Silcocks proposes an exact procedure based on the Binomial/Incomplete Beta to estimate the confidence interval while taking into account that there may be an error in the expectation. Silcocks"s ideas are implemented here, although technically the module works in a slightly different way. The procedure provides a binomial confidence interval around the rate ratio over a very large range of numbers and with high precision.

Silcocks discusses in the same paper the use of the Fieller interval as an alternative to the Binomial method. The Fieller Confidence Interval has two applications: 1) The Fieller estimate is a method to estimate the approximate confidence interval around a rate ratio when there is error in both the observed and the expected value and the number of cases is relatively large. For very large numbers, the Fieller provides a good method to approximate the results of the exact Binomial/Incomplete Beta method, which is also presented in the module. The Incomplete Beta method is the more suitable exact method for estimating the confidence interval around a rate ratio with error in both the observed and the expected values, but the method may not work well for a very large number of observations; 2) The Fieller method makes it possible to take covariance between the observation and the expectation into account. This would be relevant if the observations are a subset of the data on the basis of which the expectation was calculated. For example, the Standardised Mortality Ratio is often calculated by applying data of the national population to the local population. The mortality in Hampshire could be compared with the national mortality by using an expectation calculated by applying the national age-specific death rates (the standard) to the population of Hampshire (the index). A major crash on the motorway in Hampshire would show up in both the observation of the number of deaths in Hampshire and in the national death rates, which are used to calculate the expectation. Silcocks (1994) proposes the covariance "q" parameter for the Fieller in such a case to be: q=Sum(d(i)*n(i)/N(i))/d-tot whereby, d(i) number of deaths in the index population in the i-th age band, n(i) number of individuals in the i-th age band of the index population which are part of the standard population, N(i) number of individuals in the i-th age band in the standard population, d-tot, total number of deaths in the index population. The q-covariance parameter is entered in the program in the right bottom "(Co-)Variance" box. The default setting of this parameter is zero. The Fieller will give an estimate only if the expected value is larger than one. Thus, real cases or a mean of real cases is considered and not a proportion.

Lastly: the Walds approximation (Rothman and Greenland, 1998); the square root, normal deviate and the Poisson process (Liddell, 1984); and a logarithmic estimate are presented.

Technical Discussion.

SISA"s usual Chi-square distribution is used to do the exact Poisson confidence interval, see the discussion in the significance module. SMR-Exact is a demanding procedure. Precision is limited to four digits for expectations under about 80 and should not be used above that number. SISA"s usual Binomial procedure is used for the exact Binomial confidence interval. Here the choice is between precision and speed, the precision is now set at six digits but any level of precision can be accomplished depending on what your computer or software can manage. Higher precision can be accomplished using the SISA-tables program. Both procedures require the observed number to be an integer. If it is not an integer the procedures will echo the values for the nearest integer smaller than the observed value. You can use real observations if you like; most of the approximations will handle them. Real value expectations are treated as such with high precision in all modules, exact and approximations.

Further Reading.

Liddell FD. Simple exact analysis of the standardised mortality ratio.Rothman KJ, Greenland S.

Silcocks P. Estimating confidence limits on a standardised mortality ratio when the expected number is not error free.