CMIP6 Guidance for Data Users

Karl E. Taylor, Paul J. Durack, Sasha Ames, Martina Stockhause, …

Document overview:

  1. Experiment design
  2. Model output specifications
  3. Accessing model output
  4. Terms of use and citation requirements
  5. Model and experiment documentation
  6. Reporting suspected errors
  7. Registering published work based on CMIP6
  8. CMIP6 organization and governance

1. Experiment design

The CMIP6 protocol and experiments are described in a special issue of Geoscientific Model Development with an overview of the design and scientific strategy provided in the lead article of that issue by Eyring et al. (2016).

Each model participating in CMIP6 will contribute results from the four DECK experiments (piControl, AMIP, abrupt4xCO2, and 1pctCO2) and the CMIP6 historical simulation. These experiments are the only ones directly overseen by the the CMIP Panel, and together these constitute the ongoing (slowly evolving) “CMIP Activity”. They are described in Eyring et al. (2016).

In addition to the DECK and historical simulations, each modeling group may choose to contribute to any of the CMIP6 endorsed MIPs. See the GMD Special CMIP6 Issue for descriptions of each MIP and its experiment specifications. The official names of the currently endorsed CMIP6 MIP activities are recorded in a “json” file.

When called for by the experiment protocol, standard forcing data sets (e.g. Durack et al. (2018)) have been used. Any deviation from the standard forcing are supposed to be clearly documented.

Further documentation about CMIP6 experiments will be available from ES-DOC, and the reference controlled vocabularies used to define and identify these experiments are available in a “json” file and are also rendered in table form.

2. Model output specifications

The CMIP6 Data Request defines the variables requested from each experiment and specifies the time intervals for which they are supposed to be reported. One option for perusing the lists of variables that should be available from at least some experiments is to display the excel spreadsheet.

CMIP6 model output includes metadata and is structured similar to CMIP5 output, but changes have been made to accommodate the more complex structure of CMIP6 and its data request. Some changes have been made to make it easier for users to find the data they need and to enable new services to be established providing, for example, model and experiment documentation and citation information.

As in CMIP5, all CMIP6 output has been written to netCDF files with one variable stored per file. The data have been “cmorized” (i.e., written in conformance with the CF-conventions and all the CMIP standards). The CMIP6 data requirements are defined and discussed in the following documents:

Note that in the above, controlled vocabularies (CV’s) play a key role in ensuring uniformity in the description of data sets across all models. For all but variable-specific information, reference CV’s are being maintained by PCMDI. These CV’s are relied on in constructing file names and directory structures, and they enable faceted searches of the CMIP6 archive as called for in the search requirements document.

As indicated in the guidance specifications for output grids, weights will be provided to regrid all output to a few standard grids (e.g., 1x1 degree). All regridding information (weights, lats, lons, etc.) will be stored consistent with a standard format approved by the WIP.

3. Accessing model output

CMIP6 model output is available through a distributed data archive developed and operated by the Earth System Grid Federation (ESGF). Balaji et al. (2018) provide an overview of the design of additional infrastructure and the configuring of ESGF in supporting CMIP6. The data are hosted on a collection of nodes located at modeling centers or data centers across the world. The data can be accessed through any of the CMIP6 CoG web interfaces, which enable users to search across the entire distributed archive as if it were all centrally located.

See this summary table to view available experiments and models.

Here are the currently active CMIP6 CoG sites (all data can be accessed via any one of these):

To get to the search interface click on “More search options” under the large red text near the center of the page. There are additional options for searching through the web interface (see “More Search Options” near the top right of the page) and there is also an API that can be used to perform searches. Tutorials are available by following the link labeled “Technical Support” near the top right of the page. Expert users may also want to use the ESGF Search RESTful API.

Globus is available for downloading some datasets and will provide much better performance for large data volumes. With the Globus Download option, ESGF will prepare a python script for batch downloads, or you can monitor transfers for a “Web Download”. You can download an entire “data cart” in one step if all datasets in the cart are served by Globus. The Globus option requires you to establish a user account on ESGF (see “create account” at top right of CoG pages). Note also that a second logon with a Globus-enabled credential is required (nb.: Google ids in addition to many institutions are accepted).

4. Terms of use and citation requirements

To enable modeling groups and others who support CMIP6 to demonstrate its impact (and secure ongoing funding), you are required to cite and acknowledge those who have made CMIP6 possible. You also must abide by any licensing restrictions, see below.

Please carefully read and adhere to the CMIP6 Terms of Use.

CMIP6 model output datasets and forcing datasets should, according to the terms of use, be cited by any publication that make use of them (see Data Citation Guidelines). It is important to include the version (latest dataset version or if not available the latest data download date) in the data citation of the evolving CMIP6 data. Further information on the data citation concept for CMIP6 is available at cmip6cite.wdc-climate.de and described in Stockhause and Lautenschlager (2017).

Data references give credit to the data providers and enable the traceability of research findings (see contribution to the CMIP6 Model Analysis Workshop). They are provided on two granuliarities: fine experiment (contribution by one model to one experiment) and coarse model/MIP (contribution by one model to one MIP). Data references can be found:

5. Model and experiment documentation

The controlled vocabularies contain basic information about the models, institutions, and experiments in CMIP6. The CMIP6 results will be fully documented and made accessible via the ES-DOC viewer and comparator interface (https://search.es-doc.org). Each CMIP6 model output file includes a global attribute called “further_info_url” which will link to a signpost web page providing simulation/ensemble information, model configuration details, current contact details, data citation details etc. This link is also selectable next to each dataset returned by the CMIP6 CoG search interface. ES-DOC will include documentation of:

6. Reporting suspected errors

Information about discovered issues of CMIP6 data is captured by the ES-DOCs Errata Service. The Errata Service provides the ability to query modifications and/or corrections applied to CMIP6 data in two ways:

Any ESGF user can report an error to the appropriate modeling group (see “contact” attribute in the netCDF files), or through the ESGF user mailing list. After a report is received, the corresponding data manager can create a new errata entry using an easy and user-friendly form. A command line client is also available. The aim is to clearly and concisely document the issue and through the PID integration, this errata service will include all the datasets/files affected when documentation is completed correctly.

7. Registering published work based on CMIP6

Please register on the CMIP6 publication database any articles you publish that make use of CMIP6 output.

8. CMIP6 organization and governance

The CMIP Panel, which is a standing subcommittee of the WCRP’s Working Group on Climate Modeling provides overall guidance and oversight of CMIP activities. Notably it determines which MIPs will participate in each phase of CMIP using the established selection criteria listed in Table 1 of Eyring et al. (2016). On its webpages the CMIP Panel provides additional information that may be of interest to CMIP6 participants, but only the CMIP6 Guide (this document) provides definitive documentation of CMIP6 technical requirements.

The endorsed MIPs are managed by independent committees, but acceptance of endorsement obligates them to follow CMIP’s technical requirements. Thus across all MIPs, the modeling groups can prepare their model output following a common procedure.

The CMIP Panel has delegated responsibility for most of the technical requirements of CMIP to the WGCM Infrastructure Panel (WIP). The mission, rationale and Terms of Reference for the panel can be found here. The WIP has drafted a number of position papers summarizing CMIP6 requirements and specifications. Among these is the CMIP6 reference specifications for global attributes, filenames, directory structure and Data Reference Syntax (DRS). The WIP has also set up a CMIP Data Node Operations Team (CDNOT) to interface with data node managers responsible for serving CMIP6 data. This team provides a direct link from the panels establishing data node requirements to those implementing the requirements.

Information is under preparation describing the governance of the following:

Document version: 19 October 2022