Input should be the pursued alpha level, a decimal number between 'zero' and 'one' in the top box. The number of comparisons, a positive integer number without decimals, is given in the second box.

Optional, one can set the mean r (correlation) to zero for full Bonferoni correction and to a value between 0 and 1 for partial Bonferroni correction.

A further option is to give the degrees of freedom to obtain the critical value for t, instead of the critical value for z. The degrees of freedom should be the number of cases in the study minus one. Also, if degrees of freedom are given the t-value is given for the comparison of k>2 independent means, according to Scheffé's method. The degrees of freedom in this case should also be the number of cases minus one.

The Holm method is a top-down stepwise procedure whereby the Bonferroni procedure is recalculated time after time for the hypothesis left to test. Comparing observed p-values with critical p-values step by step, as soon as a non-significant p-value is found all remaining p-values are declared non-significant. This cryptic description is illustrated in the table.

order | Holm-B | Holm-S | B-H | your-p | Holm | B-H |

1 | 0.00714286 | 0.00730083 | 0.00714286 | 0.00001 | smallest | smallest |

2 | 0.00833333 | 0.00851244 | 0.01428571 | 0.00099 | ↓ | ↑ |

3 | 0.01 | 0.01020622 | 0.02142857 | 0.003 | ↓ | ↑ |

4 | 0.0125 | 0.01274146 | 0.02857143 | 0.0145 | ↓ | ↑ |

5 | 0.01666667 | 0.01695243 | 0.03571429 | 0.016 | ↓ | ↑ |

6 | 0.025 | 0.02532057 | 0.04285714 | 0.06 | ↓ | ↑ |

7 | 0.05 | 0.05 | 0.05 | 0.13 | largest | largest |

For the Holm method you order your p-values from smallest to largest in the "your p" column. According to the Holm procedure you start in the first row, in this row your-p is smaller than the critical p, the p-value is declared statistically significant. Continue doing this until you reach the 4th row. In this row your-p is larger than the critical p. Declare this p and all p-values larger than this p non-significant, including the p-value in the fifth row, which is lower than the critical p-value. Holm-B is based on the Bonferroni procedure, Holm-S on the Sidak procedure.

Bonferroni adjustment procedures (generally named "family-wise error rate (FWER)" procedures) are correct if you want to control the occurrence of a single false positive, one or more incorrectly significant declared results, in a family of N tests. False discovery rate (FDR) procedures are based on the notion of having a prior expected proportion of false positives among k significant declared tests. If the p-values are from independent tests the binomial cumulative function gives you the FEWR given a certain FDR. Say, you expect 5% of 10 positive outcomes to be false positive, then there is a 40% probability of one or more false positives, an 8.6% probability of two or more false positives, etc. ^{$}

One such FDR procedure is the Benjamini-Hoghberg procedure. The p in the alpha box now stands for proportion, and no longer for probability. It is the proportion of false positive results in the statistically significant results. This proportion is sometimes set higher as the usual 0.05, dependent of the cost-benefit of a particular outcome (McDonald, 2009). However, this would increase the number of false positives. In the Benjamini-Hochberg procedure you work bottom-up. The p-value in the 7th row is larger than the critical p-value and declared non-significant. Same for the 6th row. However, in the 5th row your-p is smaller than the critical p-value. You stop testing and declare this p-value and all p-values smaller than this p-value statistically significant. This test requires the p-values to be independent. Thus, the test is valid if you compare the p-value of the comparison A<->B with the p-value of the comparison C<->D. However, the test is mostly not valid in comparing the p-value A<->B with the p-value A<->C. If A is larger than B it is more likely to be also larger than C.

Benjamini and Yukutieli proposed the divisor "c" in case the assumptions of the B-H procedure are not met. In the case of independence or positive correlation of the tests c=1. So nothing changes. In the case of arbitrary or unknown dependence you need to divide the B-H critical values by the Benjamini-Yukutieli divisor in the output. The step-up method further works the same. In the case of the table above the Benjamini-Yukutieli divisor=2.593 and the critical values then become 0.00714/2.593=0.0028; 0.0142/2.593=0.0055; 0.0083; 0.011; 0.0138; 0.0165 and 0.0193 respectively. With Benjamini and Yukutieli correction only the p-values in the first three rows are declared statistically significant.

The best choice in any case would be the Holm-Sidak. Note that the correlation adjusted methods are less well researched and not generally accepted. Further, consider that in many cases the best choice is not to use any of these procedures and to leave the p-values uncorrected. The Benjamini-Hoghberg is a conceptually unusual procedure only to be applied to answer the specific question of the false discovery rate.

There are a number of good overviews which discuss when (not) to apply Bonferroni adjustment, in particular the overview by Bender & Lange (2001) is advised. However, a number of rules might be helpful. If you have one question or hypothesis at the beginning and a single answer or conclusion at the end, and more than one look at the data or multiple statistical tests in between, Bonferroni adjustment should very seriously be considered. If besides your one question and conclusion you have some new observations and questions at the end, then the statistics or observations which generated the new questions should certainly not be Bonferroni corrected. The new observations will have to be confirmed on a separate dataset. More complex, if you have a vague and general question at the beginning and a number of conclusions and new questions at the end, with a couple of looks at the data in between, then Bonferroni adjustment is probably not appropriate. But maybe it is.

There are two quite general situations in research, multiple groups and multiple outcomes. A multiple group situation is for example when there is a group which receives no treatment, compared with a group on current treatment, compared with group new treatment A compared with group new treatment B. Anova is the most often used method of analysis in this case. Note, the Anova itself will never be Bonferroni corrected but the multiple comparisons which follow will be. If the conclusion would be that there is an effect if there is a statistically significant difference between any combination of two groups, then the analysis has to be Bonferroni corrected, with ~criticalP/6. (=n(n-1)/2=4(4-1)/2=6)

This is fine, because mostly the tests between the groups will be independent. Mostly, but not always. Say treatment A and treatment B are different doses of the same active ingredient, then there is probably dependence. If there is statistical significance between current treatment and treatment A, there is a much increased probability that there is also statistical significance between current treatment and treatment B. The Bonferroni adjustment then would have to be ~criticalP/(6*correction factor) whereby 0<correction factor<1. If in treatment group A and treatment group B there would be the same result these two groups would have to be collapsed into one and the number of comparisons would be 3 instead of 6.

The correction factor is difficult to estimate in the multiple group situation. However, some attempt has to be made in this situation to assess the level of dependence between the groups in the study and to correct for it. In the above mentioned example one might choose to make the divisor 4.5 instead of 6.

Multiple outcomes is when more than one outcome is measured on each study subject, for example for each person in the study we observe 1) dizziness, 2) depression, 3) fatigue and 4) headaches. The treatment is concluded to be effective if there is an effect in one of these outcomes. Clear case of Bonferroni adjustment required. Full Bonferroni adjustment can be used if there is no dependence between the outcome variables. Thus, the treatment has an effect on headaches, but might or might not have an effect on any of the other outcomes. No Bonferroni adjustment should be applied if the outcomes are totally dependent. If the treatment has an effect on dizziness it will have a similarly powerful effect on any of the other outcomes. Basically, the above mentioned medical problems are just different names for one medical phenomenon. In practice there will always be some dependence in multiple outcome designs and the appropriate Bonferroni adjustment will be on a curve between no correction and full correction. This is illustrated in the figure:

The top right corner of the figure, where all the lines come together, shows the example of no Bonferroni adjustment. Full Bonferroni adjustment can be seen on the left, were the distance between the lines is largest. Given the example of the above mentioned four medical problems, the average in a triangular matrix of the six correlations between the outcome measures could be 0.3. We do four comparisons between the experimental and the control group, then the critical value for each comparison should be according to the red line in the figure about 0.025. Adjustment according to this figure is given in the webpage if you give a correlation and check the box.

In the SISA correlated Bonferroni program the default is the D/AP procedure as discussed by Sankoh et.al. (1997). The results for the D/AP method are shown in the second figure. What the figure shows is that the D/AP procedure is relatively conservative and has a tendency towards the full Bonferroni adjustment in the case of moderate correlation between outcome variables, particularly when the number of comparisons is large.

Bender, Ralf, & Stefan Lange. Adjusting for multiple testing—when and how?. Journal of clinical epidemiology 54.4 (2001): 343-349.

McDonald, John H. Multiple comparisons. In: McDonald, John H. Handbook of biological statistics. Vol. 2. Baltimore, MD: Sparky House Publishing, 2009.

Perneger TV. What is wrong with Bonferroni adjustments. *British Medical Journal* 1998;136:1236-1238.

Rothman, Kenneth J. No adjustments are needed for multiple comparisons. *Epidemiology* 1.1 (1990): 43-46.

Sankoh AJ, Huque MF, & Dubey SD. Some comments on frequently used multiple endpoint adjustments methods in clinical trials. *Statistics in Medicine* 1997;16:2529-2542.

$: When proportion FDR of N positively tested independent results is expected to be false then the probability of at least one false positive result FWER=1-(1-FDR)^{N}; FDR=1-(1-FWER)^{1/N}