Input should be the pursued alpha level, a decimal number between 'zero' and 'one' in the top box. The number of comparisons, a positive integer number without decimals, is given in the second box.

Optional, one can set the mean r (correlation) to zero for full Bonferoni correction and to a value between 0 and 1 for partial Bonferroni correction.

*The following section applies to the full SISA version of this procedure only.*

A further option is to give the degrees of freedom to obtain the critical value for t, instead of the critical value for z. The degrees of freedom should be the number of cases in the study minus one. Also, if degrees of freedom are given the t-value is given for the comparison of k>2 independent means, according to Scheffé's method. The degrees of freedom in this case should also be the number of cases minus one.

Sidak's correction is the much-preferred method. If you do comparisons in more than one outcome variables you should consider the correlation between these variables. Only in the specific case of comparing k>2 independent means you should use Scheffé's method.

The Bonferroni correction/adjustment procedure is the most basic of SISA procedures, however, Bonferroni correction concerns an issue about which there is much, and ongoing, discussion. Bonferroni correction concerns the question if, in the case of doing more than one test in a particular study, the alpha level should be adjusted downward to consider chance capitalization.

The alpha level is the chance taken by researchers to make a type one error. The type one error is the error of incorrectly declaring a difference, effect or relationship to be true due to chance producing the observed state of events. Customarily the alpha level is set at 0.05, or, in no more than one in twenty statistical tests the test will show 'something' while in fact there is nothing. In the case of more than one statistical test the chance of finding at least one test statistically significant due to chance fluctuation in the total experiment, and to incorrectly declare a difference or relationship to be true, increases. In five tests the chance of finding at least one difference or relationship significant due to chance fluctuation equals 0.22, or one in five. In ten tests this chance increases to 0.40, which is about one in two. Using the Bonferroni method the alpha level of each individual test is adjusted downwards to ensure that the overall -experiment wise- risk for a number of tests remains 0.05. Even if more than one test is done the risk of finding a difference or effect incorrectly significant continues to be 0.05.

Although the logic is beautiful, there is a serious drawback. If the chance of incorrectly producing a difference, making a type one error, on an individual test is reduced, the chance of making a type two error is increased, that no effect or difference is declared, while in fact there is an effect. Thus, by reducing for individual tests the chance on type one errors, i.e. the chance of introducing ineffective medical treatments or ineffective improvements; the chance on a type two errors is increased, i.e. the chance that effective treatments or improved production methods, are not discovered. So, when is Bonferroni correction used correctly and when is it used incorrectly?

Scenario one. If a crosstabulation of two variables produces more than two means in the dependent variable, than if multiple tests are done; Bonferroni adjustment should be applied. Mostly it concerns "oneway" analysis of variance. This is the case, for example, if we want to compare three religious groups on their attitudes towards alcohol use, or four groups of medical specialists on their usage of pain relief strategy after surgery. There is an extensive literature on this case and there are a multitude of different tests and methods to lower the experiment wise error rate. However advanced and well thought out many of these methods are Bonferroni correction often will be the best choice. One of the more advanced methods, Scheffé's method, is produced at the bottom of the table if there is a number of degrees of freedom for the study number of cases. Scheffé's method is not very powerful; however, there are more powerful methods available in many statistical packages. Scheffé's method has the advantage that if the overall f-test in a one-way analysis of variance is not significant then none of the individual comparisons will be significant. Most of the methods to adjust for multiple comparisons in k-means are based on the assumption that you want to compare any mean with any other mean, so, these methods mostly presume that you want to do c=k*(k-1)/2 comparisons in k means. Thus, if you want to compare Christians with other religions, but not the other religions against each other, in that case the simple Bonferroni method is better.

Scenario two . If a single hypothesis of no effect is tested using more than one test, and the hypothesis is rejected if one of the tests shows statistical significance, Bonferroni correction should be applied. For example, in a factory there are five points where quality control is applied on samples of a product, and the product is rejected for the market if a sample is below the benchmark on only one of these five tests, then the chance of rejection at each of the control points should be downwardly adjusted to keep the overall chance of incorrect rejection at a predefined level. In a randomized controlled trial (RCT) a group of patients on a new anti-diabetic rug is compared with a group of patients on placebo. To study the new drugs effectiveness blood sugar level is determined in three different locations of the patients' body. If a statistical significant difference between the treatment and the control group is found on only one of these tests the drug is considered effective. Each of the tests should be made less sensitive to ensure that the risk of a false positive, the risk of incorrectly declaring the drug effective and giving future patients pointless medication, does not become unacceptably high due to repeated testing. Basically, scenario two is not considered problematic and you should apply Bonferroni correction in such cases.

Scenario two with correlated multiple outcomes. If you test for the significance of a hypothesis using variables that are mutually correlated the Bonferroni correction is too conservative. For example, in an RCT a number of outcome variables are fully correlated. In that case knowledge of the outcome of a single test of a difference between the control and experimental group on a single variable, would be sufficient to know the outcome of the other tests on the other outcome variables. The usual Bonferroni correction would be way too conservative. In the case of correlated outcome variables a corrected alpha is required which is in between no correction at all and full, Bonferroni, correction. SISA allows you to add the mean correlation between variables as a parameter. For this you need the usual triangular matrix (without the diagonal) of the correlations between the outcome variables, sum the correlations and divide the result by the number of correlations used. A mean correlation of zero ('0') gives you full Bonferroni adjustment, a mean correlation of one no adjustment at all, for other values of the correlation you will get a corrected alpha which is in between the two extremes. Note that in the classic quality control example discussed above, using repeated independent samples, correlation should not be considered. In the example about multiple outcomes in an RCT, with multiple measurements on the single subject, correlation must be considered.

One of the problems with scenario two is that one could argue that in the case of Bonferroni correction all null-hypotheses that are the subject of Bonferroni adjustments should be rejected if only one hypothesis is false. This is known as "the global null hypothesis". For example, in the case of the blood sugar tests mentioned above, the drug will be declared effective if only one test shows statistical significance, not considering the fact that two tests might not be significant. Few scientists who apply Bonferroni adjustment are prepared to do this and they generally like to keep the option open to consider tests on their individual merit, which brings us to scenario three.

Scenario three is much more disputed. This is the case when in a single study more than one hypothesis is evaluated, each hypothesis with a single test. If the alpha level of each test is set at 0.05, at least one in twenty of the hypothesis tested will be significant, due to chance fluctuation. For example, in a life style study blood pressure, television viewing behavior, leisure time physical activity, and cigarette smoking are studied. Explaining variables are age, gender, occupation and ethnic background. Now, if one is interested in the general question whether the background variables are related to the life style variables, and to that end a number of comparisons are made, this is scenario two and Bonferroni correction should be used. However, if one is interested in the specific relationship between, say, gender and television viewing, and the specific hypothesis is tested that the respondents' gender is not predictive of television viewing behavior, then Bonferroni correction should __not__ be used. Most statisticians are of the opinion that the study of a single topic or hypothesis should, in the case of using pre-defined statements and existing theory, not be affected by what goes on in other places in the world, or in the study concerned, for that matter. Each little study done in the context of a larger study should be considered on its own merits. However, this point of view is not universally supported and particularly in medicine there is an opinion that each test in a study should be considered in the light of the number of tests done in the study as a whole.

Scenario four concerns the situation when non-predefined hypotheses are pursued using many tests, one test for each hypothesis. Basically this concerns the situation of data 'dredging' or 'fishing': many among us will recognize *correlation variables=all* or *t-test groups=sex(2) variables=all*. Above all, this should not be done. Bonferroni correction is difficult in this situation as the alpha level should be lowered very considerably in situations of such wealth (potentially with a factor of r*(r-1)/2, whereby r is the number of variables), and most standard statistical packages are not able to provide small enough p-values to do it. SISA's advice is, if you want to go ahead with it anyway, to test at the 0.05 level for each test. After a relationship has been found, and this relationship is theoretically meaningful, the relationship should be confirmed in a separate study. This can be done after new data is collected or, in the same study, by using the 'split sample' method. The sample is split in two, one half is used to do the 'dredging', the other half is used to confirm the relationships found. The disadvantage of the split sample method is that you lose power (use the procedure power to estimate how much). A Bayesian method can be used if you want to formally incorporate the result of the original study or dredging in the confirmation process. But don't put too high a value on your original finding.

Perneger TV. What is wrong with Bonferroni adjustments. *British Medical Journal* 1998;136:1236-1238. ->BMJ

Sankoh AJ, Huque MF, Dubey SD. Some comments on frequently used multiple endpoint adjustments methods in clinical trials. *Statistics in Medicine* 1997;16:2529-2542. ->Medline