 Go to procedure

# Ordinal Testing

### Input.

• Wald-Wolfowitz runs test requires three integer numbers: two numbers of cases, one for each group compared, and one number of runs.

• Mann-Whitney or Wilcoxon test requires three integer numbers: two numbers of cases and one rank-sum. The rank-sum will be larger, or in a exceptional case the same, as the number of cases in the ranking considered.

• Kolmogorov-Smirnov test requires two integer numbers: two numbers of cases, one for each group compared; and a proportional number: the largest observed proportional difference. The proportional number has a value between zero and one

### Explanation.

Ordinal testing provides three tests for two sample ordinal comparisons. It concerns a comparison between two groups, such as male or female, dead or alive, who are in some way ordered. The ordinal tests assess how probable it is that the two groups come from a single ordering and that differences observed are caused by chance fluctuation, or that the two groups come from two different orderings.

Wald-Wolfowitz Runs Test assesses if the number of 'runs' in an ordering is random or not. If the two groups are from different distributions we would expect the number of runs to be less than a certain expected number. The number of runs is given as an integer value, a positive number without decimals, in the third box. The following data is used to illustrate how the test works.

Data:
rank order            1   2  3   ....................................................................  31
data                      MMMFFFMMMMFFMMMFFFFFFFMMFMMFFFF
run number             111    222  3333    44  555   6666666  77   8 99  0000

The data concerns 14 males and 17 females who are ordered from 1 till 31. There are ten runs (the number of runs goes up with one every time a male follows a female or a female follows a male, a cluster of males or females grouped side by side is a single run). Data input for Wald-Wolfowitz equals N1=14, N2=17, number of runs 10, in the third box. The program echoes the expected number of runs, which equals 16, the z-value and the probability that the difference between the expected and the observed number could be caused by chance, p=0.01. The data shows that the number of runs observed is too few, there are clusters of males and females in this ordering which are not caused by chance fluctuation. Visual inspection of the data shows that the males tend to be clustered towards the left hand side of the scale, while the females are clustered towards the right hand side of the scale.

The procedure gives three probability values. First, a normal approximation followed by a normal approximation with continuity correction. For up to a hundred runs an exact probability value is further presented. It might take your computer a few seconds to calculate the exact probability value. For a small number of runs you would prefer the exact estimate, for a larger number you would mostly use the approximation with continuity correction \$.

The probability value presented is one-sided ('tailed'). There is no point in using a two sided runs test. We are only interested in finding less runs than the expectation.

Test for randomness. One particular application is to use the Wald-Wolfowitz runs test to see if observations occur randomly in time. Dichotomize the responses in high or low (usually on the median, which is the point or value above which we can find exactly halve of the observations). Order the responses in the order in which they were collected. You can then test and observe if higher or lower values are equally distributed in time. The test can be used to test for trend or seasonality in the data, however, the test is not as powerful as the Durbin-Watson test or some of the techniques used in time series analysis.

Mann-Whitney U Test or Wilcoxon Two Sample Test studies if the sums of the rankings for two groups are different from an expected number. The sum of one ranking is given as an integer value in the third box. If the sum is different  from the expectation this means that one of the two groups has a tendency towards the lower numbered ranks while the other group has a tendency towards the higher numbered ranks. For example, using the example above, the rank numbers for males and females are:

Rank numbers for males: 1,2,3,7,8,9,10,13,14,15,23,24,26,27
Rank numbers for females: 4,5,6,11,12,16,17,18,19,20,21,22,25,28,29,30,31

The sum of the rank numbers for males equals 182 (1+2+3+7+8+...+26+27), while the sum of the rank number for females equals 314 (4+5+6+11+....+30+31). For the program you have to give only one of the two sums, preferably you give the smaller sum, the program will calculate the other sum for you. The program will then give you the expected sums for males and females, in the example used these are 224 and 272 respectively, the standard deviation of the expected sums (25.19) and lastly the p-value of the observed divergence. In the case of the example the p-value equals 0.04779, our previous observation proves correct at the 0.05 level, females tend towards the higher rank numbers while males tend towards the lower rank numbers.

The probability value presented is one-sided ('tailed'). Use this probability value if you are only interested in the question if one of the two samples tends to cluster in a certain direction. In the case of the example, do females cluster towards the higher numbered ranks? Multiply the probability value with two if you are interested in the two-sided test, if your a-priory interest is in any difference, are females tending towards either the lower or higher numbered ranks?

Kolmogorov-Smirnov Two Sample Test. The Runs Test and the Two Sample Test above presume that the observations are not 'tied', i.e. each observation has a unique position in the ranking which it does not share with another observation. The Kolmogorov-Smirnov Test gives the likelihood of two ordered categorizations coming from different orderings or the same ordering. Have a look at this table:

Do you agree or disagree with the following statement (proportions between brackets)
[cummulative proportions between square brackets]

Males Females Difference
Totally agree 10 (0.12)[0.12] 24 (0.26)[0.26] 14 (0.14)[0.14]
Agree 15 (0.18)[0.30] 15 (0.17)[0.43]   0 (0.01)[0.13]
Neither agree or disagree 19 (0.23)[0.53] 21 (0.23)[0.66]   2 (0.00)[0.13]
Disagree 18 (0.21)[0.74] 17 (0.19)[0.85]   1 (0.02)[0.10]
Totally disagree 22 (0.26)[1.00] 14 (0.15)[1.00]   8 (0.11)[0.00]
Total 84 (1.00) 91 (1.00)   7 (0.00)

The K-S test assesses if the largest proportional cumulative difference in a table has been caused by chance fluctuation or not. In this case this difference equals [0.14] (top right cell). The program echoes the Chi-square value of the expected largest proportional difference, (Chi-2= 3.673) and the p-value of the difference between the observed and the expected largest difference, with two degrees of freedom. The p-value in this example equals 0.15933, the difference in ordering between males and females may well have been caused by chance fluctuation.

The probability value presented is single-sided. The literature considers that the Kolmogorov Smirnov test has very little power with a high chance of a type II error, i.e. of not finding a difference when there is one.

### Methodological Discussion.

What is the difference between these tests and which test to use when. The discussions are potentially endless but we will try to give a short overview of the issues here. The Mann-Whitney test is the most powerful of the tests, and therefore the preferred one. I.e., if a difference exists in the real world, the Mann-Whitney test is the most likely to pick this difference up in the sample. Mostly, but not always. The Mann-Whitney is a test on the difference in central tendency between two distributions (like the t-test and most other parametric test). Thus, if the average rank position between two rankings differs, the Mann-Whitney is the test to show it. However,  Mann-Whitney test does not work well if two distributions differ without that causing a difference in average rank position between two groups. For example, males might be primarily present in both the higher and the lower ranks while females are particularly present in the middle ranks, in that case the average for the two groups might well be the same, however, the distribution by rank very different. For such data one requires a test which looks at dispersion. The runs test (and the Chi-square is another example) is a good, although not very powerful, tests for that aim.

Both the Mann-Whitney test and the runs test consider that the observations are not tied, each observation has a unique rank position. There are a number of methods proposed to overcome the problems of tied observations, two or more observations sharing a single rank. Please consult Blalock and Wonnacott and Wonnacott for discussions. However, whatever method of correction one uses, it is considered that the number of ties are relatively few on the total number of observations. It is incorrect to apply the Mann-Whitney test or the runs test if most or all of the observations are tied. This is were the Kolmogorov-Smirnov test comes in as a test for the difference between two ordered sets of categories.

In the case of relatively few ties the following method can be applied for the Mann-Whitney and the runs test. Say one has ten numbers ordered from small to large: 0, 1, 1, 2, 2, 2, 3, 3, 4, 5. The values 1, 2, and 3 are tied. The ten values can get ten different rank numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.  In the case of ties we average the rank numbers over the values concerned.  This results in the following rank order: 1, 2.5, 2.5, 5, 5, 5, 7.5, 7.5, 9, 10. As you can see, the sum of the rank ordering is not in this way disturbed by ties.

Lastly, the output for all three test provide one sided probability values.

### Technical Discussion.

The Chi-squares algorithm comes from Poole et al, the algorithm is also mentioned in the 'Epi-Info' manual (1994).

The procedure to approximate the significance of the z-value is based on algorithm '03' from Applied Statistics (1968). 