535: Regression Options
By default, the fitted regression model for the CCA , PCR , MLR and GCM options is fitted using ordinary least-squares (OLS). However, in some cases where the assumptions of OLS are invalid, it may be preferable to use a Generalised Linear Model (GLM). A GLM may be appropriate when the Y data are not normally distributed, and/or when the relationship with the predictor(s) is non-linear. In many cases it may be preferable to use a GLM rather than the Transform Y Data option.
As part of each GLM there is a link function , which relates the fitted values of Y with the observed values of Y. In the current version of CPT, it is possible to select the link function only for poisson and gamma regression; a default link (the canonical link) is used otherwise. The link function can be selected by clicking on the Advanced button.
The following GLMs are available:
-
Logistic regression
: appropriate for Y-data that represent yes/no outcomes, or for Y-data that are probabilities.
For example, the predictand may be the occurrence of an extreme event, and the Y-data are recorded as a 0 if the event did not
occur, and 1 if the event did occur. CPT will return an error message if any of the non-missing Y data are outside the range 0 and 1.
The link function is the logistic link, and so the regression equation is of the form:
- Y = 1 / [1 + exp[b0 + BX]]
-
Binomial regression
: appropriate for Y-data that represent counts with upper and lower limits.
For example, the predictand may be the number of days with no rainfall over a fixed period.
CPT will return an error message if any of the non-missing Y data are outside the range 0 and n , where n is the length of the period in days. The link function is the logistic link, and so the regression equation is of the form:
- Y = n / [n + exp[b0 + BX]]
The predictand is assumed to be discrete if binomial regression is selected, unless the Y units are indicative of a continuous distribution (see information on the cpt:units tag for further information).
-
Poisson regression
: appropriate for Y-data that represent counts with no upper limit.
For example, the predictand may be the number of storms over a fixed period.
CPT will return an error message if any of the non-missing Y data are less than 0. The form of the regression equation depends on the link function (see further discussion below).
The predictand is assumed to be discrete if poisson regression is selected, unless the Y units are indicative of a continuous distribution (see information on the cpt:units tag for further information).
-
Gamma regression
: appropriate for Y-data that have a lower limit of 0 and are positively skewed.
For example, the predictand may be the amount of rainfall.
CPT will return an error message if any of the non-missing Y data are less than 0. The gamma regression is similar to the gamma transformation option, but is a mathematically more suitable way of fitting the model. When the Y data follow a gamma distribution, the error variance of the larger values is greater than of the smaller values, and a GLM model will account for this difference. The form of the regression equation depends on the link function (see further discussion below).
For Poisson and gamma regression, the following link functions are available:
-
Identity
: Assumes that Y is a linear function of the predictor(s). If the identity link function is chosen (the
default), the regression equation is the same as for ordinary least-squares regression (but a weighted form of regression is
used):
- Y = b0 + BX
-
Inverse
: Assumes that Y is a reciprocal function of the predictor(s). The inverse link is the canonical link for
gamma regression, although the link does not ensure that the regression estimates are positive, and so a
logarithmic link is sometimes preferred. If the inverse link function is selected the regression equation is:
- Y = 1 / [b0 + BX]
-
Logarithmic
: Assumes that Y is an exponential function of the predictor(s). The logarithmic link is the canonical
link for Poisson regression. If the logarithmic link function is selected the regression equation is:
- Y = Exp [b0 + BX]
-
Square root
: Assumes that Y is squared function of the predictor(s). If the square root link function is selected the
regression equation is:
- Y = [b0 + BX] ** 2
The scatter plot shown in Tools ~ Models ~ Regression will no longer show a straight line model fit if a non-linear link function is used, but the non-linear nature of the regression is indicated by the equation.
When using one of the GLMs, it is recommended that one of the information criteria be used as the goodness index .