Compare two independent samples
Select options and hit the calculate button.
Compare a single sample with the population
Select the one sample option, other options and hit the calculate button.
For an explanation of the Pairwise t-test or the t-test for two correlated samples, consult the pairwise help page.
T-test concerns a number of procedures concerned with comparing two averages. It can be used to compare the difference in weight between two groups on a different diet, or to compare the proportion of patients suffering from complications after two different types of operations, or the number of traffic accidents on two busy junctions. You can compare ’continuous’ averages, they can be above or below one, examples are the difference in mean length or weight between two groups of people. The certainty with which these averages are measured are expressed in the standard deviation. Also, you can compare ’proportion’ averages, basically a number divided by a larger number. Examples are the proportion of people suffering from complications comparing two different types of operation (number of complications on the number of operations), the proportion of a manufactured product damaged comparing two different methods of production (number damaged on the number manufactured). The certainty of these averages is directly related to the number of cases observed. Some more discussion on proportion averages can be found on the Binomial help-page. Lastly, discussion on counted averages can be found on the Poisson help-page.
The t-test gives the probability that the difference between the two means is caused by chance. It is customary to say that if this probability is less than 0.05, that the difference is ’significant’, the difference is not caused by chance.
The t-test is basically not valid for testing the difference between two proportions. However, the t-test in proportions has been extensively studied, has been found to be robust, and is widely and successfully used in proportional data. With one exception: if one of the proportions is very close to zero, one or minus one, you will do better with Fisher’s exact test.
Both one and double sided probabilities are given. In one-sided tests it is assumed that before doing the test you had a hypothesis that one mean of the two means was bigger than the other mean, i.e. proportion. If you did not have such a prior hypothesis, and you only aim to test for a possible difference between the means, you need to do a double-sided test; in this case you would mostly multiply the p-value by two.
Learn more about the t-test from
The program provides you with a number of additional statistics if you check the :
The population attributable risk fraction (PARF). The population attributable risk fraction is the fraction in those found to be ’diseased’ in a population which can be attributed to a risk factor. For example, to determine the effect of smoking on cancer mortality fill in the proportion of cancer deaths in the non-smokers in the top Mean 1 box and the proportion of cancer deaths in the smokers in the second Mean 2 box. The PARF gives you the proportion of cancer deaths which is caused by smoking. Note that this is valid only for the population studied and that this only works if the full input is representative of this population. This means that not only do Mean 1 and Mean 2 need to be unbiased estimators, the odds N1/N2 must be equal to the odds non-smokers/smokers in the population. The second confidence interval (RRCI), if additional confidence intervals are requested, is obtained by substituting the confidence interval for the Risk Ratio in the formulae for the PARF. The way the procedure is implemented assumes that the exposed group is in the Mean 2 box, and that the proportion in this box is higher as the proportion in the Mean 1 box.
Number Needed to Treat (NNT) is a measure which is becoming increasingly popular in the medical field. The NNT is the reciprocal of the absolute risk-difference (ard=|proportion1-proportion2|) and expresses the number of persons to be treated to ’cure’ one person.
Number Needed to Treat has some very appealing properties in interpretation, particularly in combination with cost calculation. An example of the use of NNT: if no treatment is given 20% die, with treatment 15% die. NNT=20 (1/|0.2-0.15|). We need to treat 20 people to save one life. But now we develop a preventive program in a completely different area of health care and succeed in bringing the mortality down from 45% to 44.5%. NNT=200 (1/|0.45-0.445|). We need to apply our preventive program to at least 200 people to save one life. This does not seem very effective compared with treatment.
However, the cost of treatment is $200 per person, prevention costs $10 per person. The cost per life saved equals $4000 (20*200) for treatment against $2000 (200*10) for prevention. Prevention is highly cost effective and given a limited budget it should get precedence over treatment.
This way one can do quite a number of nice comparisons using the NNT. A paper by Schulzer and Mancini gives some examples.
The default Confidence Interval for the Number Needed to Treat is calculated according to a method first suggested by Cook and Sacket. This method is based on inverting the confidence interval of the difference between two means. As the Confidence Interval for a difference between two means can be calculated by way of different methods, the same applies for the confidence interval for the NNT. Check the C.I. Option to get some different confidence intervals for the NNT and read below about these different confidence intervals. The basic default method suggested by Cook and Sacket is most often used in practice. If you check the CI option and the NNT option you also get the Confidence Interval for the NNT suggested by Schulzer and Mancini. This one is rather interesting theoretically and based on the Geometric Distribution; which is related to the NNT standing for the notion that a doctor has to ’wait’ NNT number of patients before seeing one ’cured’ patient. Please note that whatever method is used, confidence intervals for the NNT are nonsensical if the difference between the two means is not statistically significant, i.e., if the probability of the t-value is more than 0.025 (in the case of a 95% confidence interval). The confidence interval for the NNT should NEVER be used for hypothesis testing. It is there for your information only. Use the t-test for hypothesis testing. Read more about the problems with the confidence interval for the NNT here.
The More menu lower down the t-test output allows you to do a full table analysis including Odds ratio’s, Risk ratio’s and four different types of Chi-square and to do an exact analysis using the Fisher procedure. For one sample analysis the Binomial procedure is the exact alternative. To calculate confidence intervals around the t-test parameters the one mean procedure can be used and sample sizes can be calculated using the sample size procedure. Help for these procedures can be found on the Two by Two help page, Fisher help page, Binomial help page, One Mean help page and Sample Size help page respectively.
SISA will default assume that the variances are unequal and will calculate Welch’s t-test. This method produces a slightly smaller t-value as the traditional student’s t-test. Degrees of freedom for the Welch’s t-test are calculated using a complicated formula. The number of degrees of freedom will be smaller as in the student’s t-test. If you check the ’Equal Var’ box SISA will calculate the traditional student’s t-test with n1+n2-2 degrees of freedom. The student’s t-test is more powerful than Welch’s t-test and should be used if the variances are equal. There is an f-test to estimate the probability of the variances being equal.
Checking this option gives you additional Confidence Intervals for the difference between two proportions, the difference between two means and the NNT. Note that the default confidence intervals, the ones you get when you do not ask for additional confidence intervals, would be the preferred choice for many researchers. The CC-Wald confidence interval is the Continuity Corrected version of the usual Wald confidence interval. Gives a slightly wider confidence interval and can be used when the number of cases is small. The Newcombe-Wilson (NW) hybrid score confidence interval is proposed by Newcombe (1998). Is based on different assumptions regarding the relationship between the sample and the population variance, the score test approach. Gives a slightly narrower confidence interval. Although the method has some superior properties it is rarely used.
There are various ways in which the number of degrees of freedom for the two-sample t-test can be calculated. In the case of un-equal variances Welch’s t-test is mostly used and the number of degrees of freedom for this method is calculated with the Welch-Satterthwaite equation. The number of degrees for the student’s t-test equals n1+n2-2. In the case of the equal variance assumption this number of degrees of freedom is correct for the student’s t-test. However, if the variance of mean1 is different from the variance of mean2, this number of degrees of freedom is too large for the student’s t-test. Using the n1+n2-2 number of degrees of freedom leads to a difference being declared statistically significant too easily and a higher chance of a Type I error. Wonnacott and Wonnacott, among others, suggest using n1-1 or n2-1, whichever is smaller. Unfortunately, using this formula makes it too difficult to declare a difference statistically significant, an increased chance of a Type II error. Should you wish to use this method anyhow (not suggested!) you can do your calculations and use the Significance procedure on the SISA website to calculate the p-value.
There are two cases in which population analysis can be done: a) the ’historical’ situation were some sort of an arrived opinion on the numerical value of a phenomenon exists; b) in the case the numerical value is a population value, for example, the number of deaths in a community can be exactly known. In the case of ’a’, exactly seems a relative concept and Bayesian methods might be preferred. In the case of ’b’ the methods proposed here are valid. Fill-in the population proportion or average in the top box, fill-in the sample proportion or average in second box. Input is much the same as above. ’Click’ the population button.
There are some subtle differences which we will discuss now.
In the case of proportions it should be considered that the underlying nature of the data is quite different from data used to test for a difference between two estimated proportions. In the case of two estimated proportions the data consists of a two by two table and all methods for table analysis apply. In the case of comparing a population value with a sample estimate it concerns data which compares an expected with an observed distribution in a one dimensional array with two categories. In this case not the Fisher but the Binomial is the exact alternative and most appropriate test, use it in SISA online, use SISA’s MsDOS version if you have a large sample size, or a Poisson approximation if you have a very large sample. The method used in this population procedure is the normal approximation of the Binomial.
Give the population proportion in the top box and the sample proportion in the second box. Only one number of cases has to be given (for the sample size), and you do not have to give standard deviations. If you want descriptive statistics click the ’calculate’ button. Odds and risk-ratioos, and NNT are valid. However, standard errors, confidence intervals, and significance tests for these ratio’s and otherwise are not valid. In the procedure as implemented here continuity correction is not applied.
For averages the analysis suggested here is very valid and correct for its purpose. Give the population average in the top box and the sample average in the second box. Only one number of cases has to be given (for the sample size). If you give a standarddeviation in the standard deviation 1 box this standard deviation is considered to be the population standard deviation and the normal distribution is applied, if you give a standarddeviation in the standard deviation 2 box, which is the most bottom box, this standard deviation is considered to be the sample standard deviation and the t-distribution is used. If you give no standard deviation at all the averages are considered to be Poisson distributed rates and the standard deviation is considered to be the square root of the average. No continuity correction is applied.
A box plots option is given for numerical continuous quantitative outcome data, such as age, or length. Box plots are used to compare the distribution of the data between the two groups. Is the data equally distributed around the mean, are the means or standard deviations influenced by outliers? Box plots require ample observations with many different values. Box plots are not meant for use in categorical data such as an ordered frequency tables or a Likert scales. However, if there are many categories based on numerical values box plots can also be useful to explore data in ordered tables. To somewhat protect the user box plots can only be requested if there are at least 15 observations in a group.
SISA uses a standard boxplot technique. The box runs from the first to the third quarter number of observations of the value ordered data, the median at 50% is the middle line inside the box. The whiskers of the box are the first valid values inside 1.5 times the interquartile range downward from the first and upward from the third quartile. Outliers are values removed more than 1.5 times the interquartile range from the box. Outliers are starred. The star within the whiskers or the box itself is the mean. No such star no mean inside these value.