X and Y Modes
Both CCA and PCR proceed by prefiltering the X data by decomposing them into empirical orthogonal functions (EOFs) or principal components. The CCA option prefilters the Y data as well. The MLR option does not have EOF options, but if either the CCA or PCR option are selected it will be necessary to specify the number(s) of modes to retain. The maximum and the minimum numbers of modes need to be specified. The program will prompt the user for the numbers of modes automatically after opening an input file, but these values can be reset from the X and Y modes options under the Options menu item. Note that changing the number of modes will require CPT to reset if results have been calculated.
If the maximum number of modes is larger than the corresponding minimum, CPT will identify the number of modes, between the specified minima and maxima, that provides the best cross-validated results. The optimum number of modes is identified by producing cross-validated predictions for each of the series in the Y data file, and correlating these with the observed values. A "goodness of fit index" is calculated (see Goodness Index for details of how this index is calculated) and the number of modes with the highest index is defined as the optimum number.
If you plan to rerun the analysis at a later time then once the optimum numbers of modes has been identified it is advisable to reset the minimum and maximum numbers to the optimal settings, and save (saving program settings is described in Saving the Settings ) so that the program runs more efficiently next time.
There are a few restrictions on the numbers of modes:
- the maximum number of modes must be at least the corresponding minimum number;
- the number of modes cannot exceed the minimum of the number of gridpoints/stations/series and of the length of the training period.
By default the principal components are calculated using the correlation matrix (the analysis is based on the standardised anomalies). However, it is possible to base the analysis on either the variance-covariance matrix (the anomalies) or the sums of squares and cross-products matrix (the raw values). To set the method for calculating the principal components, click on the Options menu item, select whether to set the X or the Y modes settings, click on the "Advanced" button, and then choose the required option. Whichever matrix is used, cosine latitude weighting is applied to gridded data fields. For station data, the points are given equal weight, and so the loadings will partly reflect the geographical distribution of the stations.