 Go to procedure

# Correlation/Regression procedure

### Explanation.

A regression/correlation study describes and measures the extent in which the outcomes of variables are related. Are high or low values on the one variable predictive of high or low values on another variable?

1. Describe and test a two variable uni variate regression
2. Describe and test a two variable weighted regression
3. Analyse three variables and compare the dependent correlations
4. Analyse and compare two regressions for two groups

If you have correlation(s) already calculated and you want to know the significance of the correlation, confidence interveals, statistical power etc. etc. please go here.

### Input.

Two columns of data are required for the two variable option, the first column is considered to be a numerical explanatory X variable which explains a numerical dependent Y variable in the second column; for the second weighted analyses there is a third column with numerical weights; for the third three variable option there are three columns with numerical data; for the two groups regression there are three variables of which the first is a column with names which can be text or numbers. The first two names found are considered to indicate the two groups. Any other name is ignored. The columns in the input field have to be separated by spaces, returns, semicolons, colons, or tabs, but can NOT be separated by comma's or full stops. Cases with non-numerical values in numerical fields are ignored an added to the number of invalid cases. The data can be copied for example from a spreadsheet or word processor or the data can be typed manually into the "Input data here:" field. The data in the input field is read row wise, so first the data in the first row followed by the second, the third and all the other rows. The variables in the columns can be identified in the output by giving the variables names in the first row and checking the "Var labels in 1st row" option. Names can be text or numbers.

### Output.

Output consist first of some descriptives of the data and the data read procedure and the correlations between the variables. From this point one can move onwards to the correlations page for confidence intervals, sample size calculations or to compare correlations. Read the help page related to this correlation page for instructions. Scatter plots with the results of a linear (ols) regression can be requested. Also there are tables with the regression coefficients, the significance of the regression, and an Analysis of Variance. Further there are descriptives for the variables used.

In the weighted analysis both the weighted and the un-weighted regression are shown. The data points for the weighed and the un-weighted analysis are in the same spot, however, while in the un-weighted analysis each data point caries the same weight, in the weighted analysis some data points are more important than others. Which explains the difference between the two lines.

In the comparing two groups regression two figures are presented. The first figure shows the regression lines for each of the two groups uni-variate. The second figure shows the regression kept constant for the third variable. Thus, the difference between the two parallel lines is the difference between the two groups with regard to the dependent variable on the Y axis kept constant for the variable on the X-axis. Simultaneously, the slope of the two parallel lines is the same as the slope of the light blue line, kept constant for the effect of the difference (in location) between the two groups.

TBA plot or Bland-Altman plot or Tukey mean-difference plot. Plots the difference between two variables on the Y-axis against the mean of the same two variables on X-axis, thus point: xi,yi={(v1i+v2i)/2,v1i-v2i}. The plots are used to study the agreement between two different tests or instruments, represented by the two variables, on the same object or subject. Known as a Bland-Altman plot in medicine and as a Tukey mean-difference plot in most other fields. Important aspects to look for are 1) bias, that the difference between the two measurements is different from zero, 2) heteroscedasticity, that the clout of points widens or narrows dependent on the value on the x-axis, this indicates that the reliability of a measurements changes with the value of the measurement, and 3) trend, which indicates that the bias between the two measurements changes with the value of the measurement. To study a trend one can fit a regression line through the Tukey-Bland-Altman plot. Besides the Tukey-Bland-Altman plot the overall correlation value and scatterplot also give an insight in the agreement between two instruments. Low correlation means that it is hard to predict values on the one instrument with the other instrument, thus low correlation is low reliability in using two instruments interchangeably. Linear flat bias, generally, will be a lesser problem as one would normally expect instruments to be calibrated against some standard by adding or subtracting a (bias) value. If there is a trend in the Tukey-Bland-Altman plot there is a scale problem, the scale used on the instruments need calibration. Besides the option to fit a line there is the option to plot the difference against only the first variable, v1, instead of the mean between the two variables. This one variable option is used if the first variable is the gold standard against which the other variable is judged. 