Skip to content

CDF2CIM Archive


The cdf2cim-archive hosts a collection of JSON files extracted from NetCDF files. The file extraction process is an extension of the esgf-publisher. By running the publisher with the appropriate flag a directory scan is performed and for each NetCDF file a cdf2cim JSON file is created. The JSON file is given a unique identifier based upon a hash of it's contents.

Directory Layout

Each JSON file is associated with a project, model & experiment. Thus the archive's directory structure reflects the following organisational principle:

  |--> data
         |--> project
                |--> institute
                       |--> experiment

Extracted JSON files


The name of each file is derived from a hash of it's contents. Only a subset of file fields are included in the hash derivation. For example:



Here is an example file for CMIP6, it's hash identifier is stored as the _hash_id field:

    "_hash_id": "0a67f9cf0b7e2f5f4fe89d8401197975", 
    "activity_id": [
    "branch_time_in_child": "2021-01-01T00:00:00Z", 
    "branch_time_in_parent": "1950-01-01T00:00:00Z", 
    "calendar": "standard", 
    "dataset_versions": [
    "end_time": "2031-01-01T00:00:00Z", 
    "experiment_id": "control-1950", 
    "filenames": [
    "forcing_index": 2, 
    "further_info_url": "", 
    "initialization_index": 1, 
    "institution_id": "AWI", 
    "mip_era": "CMIP6", 
    "parent_forcing_index": 2, 
    "parent_initialization_index": 1, 
    "parent_physics_index": 1, 
    "parent_realization_index": 1, 
    "physics_index": 1, 
    "realization_index": 1, 
    "source_id": "AWI-CM-1-1-LR", 
    "start_time": "2021-01-01T00:00:00Z", 
    "sub_experiment_id": "none"


The actual archived documents hosted upon GitHub are stored in a compressed format. CLI commands (see below) can be used to compress/uncompress.

Environment Variable

Once installed via a simple git clone command, the following environment variable should be assigned:

export CDF2CIM_ARCHIVE_HOME=$(pwd)

TIP: assign this environment variable in the user's ~.bashrc file.

Command Line Interface

The archive supports a command line interface to streamline operations.


Place the following in one's ~/.bashrc file:

source INSTALL_DIR/sh/activate



Compresses set of documents within INSTALL_DIR/data folder. The compressed documents are written in 50MB chunks and named as follows: INSTALL_DIR/blobs_xx.


Uncompresses set of previously compressed documents named INSTALL_DIR/blobs_xx. The uncompressed documents are written to: INSTALL_DIR/data.