 # Means Table

### Explanation.

Means tabulates and summarizes a single continuous variable into categories as defined by a separate categorical explanatory factor variable. The result is comparable with the results of the SPSS Means procedure. Basic statistics such as the mean and standard deviation for each category are given and the means table can be used as the basis for a one-way anova procedure.

There are two input pages, one for individual one case per row data input, and the other page for the input of a means table.

Data for both pages has to be pasted or typed into the input field in a multi column format. Columns are separated by spaces, returns, semicolons, colons and tabs but NOT by commaĺs or full stops. Data or tables can be pasted from spreadsheet programs such as Excel or Calc. Rows which do not follow the required format are ignored and invalidated. The number of invalid rows is given in the browser address box, directly behind the php call.

For the data input procedure two columns are required, the first column is considered to be an explanatory, multinomial, categorical, factor variable which explains a continuous numerical outcome or dependent variable in the second column. The factor variable in the first column can be a name, numerical or text, the outcome variable in the second column is always numerical, otherwise invalid.

For the table input procedure one can input four columns of data of which the first contains the category labels in the form of a name, number or text variable, the second column the category means as a real number, the third column the standard deviations, also real, and in the fourth column the number of cases or frequency for each category are entered as an integer number. Alternatively three columns can be entered, the labels are not included but automaticaly replaced by a default.

The program default outputs the mean, standard deviation and the number of cases/frequency of occurrence for each category value. This output can be used as input for the t-test procedure in case you want to check if the mean difference between two categories is statistically significant.

The following options can be checked:

### Options.

Read weights considers that every third value or the third column is the case weight of the previous two values. The case weight must be numerical, if not the case with its values is ignored and counted as invalid. A weighing corrected means table is produced. For a discussion of data weighing and the correction applied please read this paper.

Sort Descending or Ascending. Sorts the categories of the factor variable or labels in the first column descending or ascending in the output.

Lowercase All. For the data input procedure, lowercase all non numerical text characters of the factor variable. Use this option if you want to categorize the data case insensitive.

Solve problems into 99999.9. Change the data sequence -cariage return-line feed-tab- and the sequence -tab-cariage return-line feed- into 99999.9 if labels or delete the case if value. Wil mostly solve the problem of system missing values in data copied and pasted from SPSS. Might cause other problems.

### Additional Output.

Variance and standard errors for the mean are additional measures of the spread of the observed values around the mean. The standard error will decrease with increasing sample size.

Percentages give the percentage of observations on the total number of observations for each factor variable category.

A Confidence Interval is given for each category if you fill in a numerical percentage value between 1 and 99.99 in the input box. 95 is the most often used value for this option. You can choose between a confidence interval based on the z-distribution or one based on the t-distribution. The confidence interval based on the z-distribution is mostly used. If the confidence interval between two means do not overlap the difference is significant at the given confidence interval level. If a confidence interval of a particular mean includes another mean the difference between the two means is not significant. Use the the t-test procedure to confirm your results for the difference between two means. Beware of chance capitalization. In comparing all these confidence intervals you are bound to find some significant purely on the basis of chance fluctuation. Please read the Bonferroni help page for a discussion on this topic and maybe apply the Bonferroni procedure. Note that the means in this procedure are not correlated.

Show Rows limits or expands the number of rows displayed. As the Means procedure cannot handle very large tables with many rows you can limit the number of rows by giving an integer value in the "Show Rows" box. This box can also be used to exclude particularly high or low (after "Sort Descending") (missing) value categories from the analysis. If you want to input more data the number of rows can be expanded by giving a higher number in this box. Do not forget to check the "Editable outcome" checkbox lower down.

### One-way Anova, F-test and t-tests.

On the output page the option is given to apply a one-way anova with f-test to see if there is an overall statistically significant difference between the means. To see if there is a statistically significant difference between each set of means t-tests can be requested for each comparison. Please consult the one-way anova help page for further instructions.

### Editable outcome.

If one requests an editable outcome the labels, means, standard deviation and frequencies are presented in editable boxes in the resulting means table. This option can be used to enter more data in combination with requesting additional rows with the "show rows" option above. The categories can be given nicer names before the table is used for a one-way anova procedure. Also, if category labels are given the same name these categories can be combined in the one-way procedure. Lastly, rows were the number of cases or frequency is zero are ignored in the one-way procedure. This allows for basic anova contrasts to be made.

### Example.

Paste this data into the data input procedure:

1 1985
1 3010
a 1345
3 2200
3 1600
1 2000
1 .
a 2130
3 3345
3 2130

Or this data into the table input procedure:

1 2331.67 587.5 3
3 2318.75 734.74 4
a 1737.5 555.08 2
b . 578.16 3

Produces this table (excluding the invalid value):

 Means table Labels Mean Stddev Freq r1: 1 2331.67 587.5 3 r2: 3 2318.75 734.74 4 r3: a 1737.5 555.08 2 Total 2193.89 627.88 9

### Limitations.

The formatting and tabulating of large datasets might take a while in which case there might be warnings, just select "continue" and in the end the computer will get there. 