Input Files

CPT requires the input files to follow strictly one of three structures ( gridded , station , and unreferenced or index), each of which is described on the following pages. Currently, the input files for each of these structures must be in ASCII (or text) format, although other formats are being developed and will be implemented in later releases of the software.

New input file formats were introduced in CPT version 10. The current version of CPT is able to read the new formats (described in the next few pages) and the old files , but CPT versions 10+ are more flexible than version 10. The old formats are unable to support some of the new features of CPT, including multiple fields, and so-called "EOF extensions" (which are typically the inclusion of additional predictors at different lags, and which are called "lagged fields" in CPT). More importantly, CPT will not be able to correctly identify appropriate start dates or the length of the training period in some cases if the old formats are used (see further discussion of the cpt:S and cpt:T tags ). In versions 11+, the CPT version 10 file formats have been simplified somewhat, and some increases in flexibility of both the old and the new formats have been implemented to allow improved handling and simpler construction of files.

All version 10+ file formats must begin with the following first few lines. The first line of an input file is always:

xmlns:cpt=http://iri.columbia.edu/CPT/v10/

This tag defines an XML namespace for the "cpt" prefix to be used in subsequent lines. This line should be copied exactly as it is in the top of all CPT version 10+ input files.

The second line indicates the number of "fields" in the file:

cpt:nfields=1

A "field" is set of variables for the same meteorological parameter measured at different locations; for example, a rainfall measurements at a set of stations, or a grid of sea-surface temperature records. In older versions of CPT only one field was permitted, but from CPT version 10 multiple fields can be used. Additional fields can represent different meteorological parameters, and/or different the same parameter at a different lag. Identical parameters at different lags can be represented as separate fields or as lagged fields. Mathematically lagged fields and fields are handled identically, and so it does not matter how they are represented. There are some restrictions on lagged fields, but they may be easier to format, especially in station and unreferenced files. The following pages describe the distinctions in more detail. How the different fields are set out in the file depends on the file type, as described in the following sections. If nfields=1, then this line can be omitted in CPT version 12+.

For probabilistic forecast input files the next line indicates the number of categories in the file:

cpt:ncats=3

Currently, the number of categories is constrained to be 3. For other input files this line should be omitted.

In version 10+ file formats, the next line contained the tag cpt:T followed by a list of all the dates in the file. This line is no longer required, and is ignored if the line exists. CPT version 11+ does not include this line in output files.

The next line contains a series of CPT tags that set information about the immediately following block of data. This information depends on the file structure, and so this line is described separately for each format. The tags can appear in any order, but some tags are compulsory and others optional. Each tag is preceded by "cpt:" followed by the name of the tag, then "=" and the value that the tag takes. The tags associated with the different CPT data formats are described in detail in the following pages and the meanings of each of the tags is provided if needed.

It is important to consider which dates to include in the input datasets since the analyses that CPT will perform depend upon how the data are structured. When using CPT for seasonal forecasting, the Y file would typically contain only one set of values per year unless you specifically wish to forecast multiple target periods synchronously. These Y values would normally be a seasonal total or average. However, in version 14+, if the Y file contains monthly data and all months are present then CPT will either identify the appropriate season automatically, or prompt for the season , and then calculate the seasonal totals or averages. CPT can identify the season automatically when using the Probabilistic Forecast Verification (PFV) option, or if the cpt:S tag is set in the X file.

For the X file, if all twelve months are present, CPT will read the data as if there were a total of 12 lagged fields. It would almost certainly be inappropriate to include all 12 months in the X file.

If CPT is being used on daily, pentadal, or dekadal data, lagged fields are not implicitly recognised, and so any desired lags have to be included as separate fields.