Glossary¶

Key terms and concepts used throughout the Hera system.

Project¶

A Project is the central workspace in Hera. It represents a named container that groups all data (measurements, simulations, cached results) and configurations together. Every interaction with data goes through a Project instance, which manages three MongoDB collections and a local files directory.

from hera import Project
proj = Project(projectName="MY_PROJECT")

See Core Concepts: Project for technical details.

Toolkit¶

A Toolkit is a domain-specific module that extends the Project class with specialized functionality for a particular type of data or analysis. Examples include TopographyToolkit for elevation data, lowFreqToolKit for meteorological station data, and OFToolkit for OpenFOAM simulations.

Each toolkit provides:

Data access — Managing datasources of its domain
Analysis — Processing and computation methods
Presentation — Visualization and plotting capabilities

Once data is loaded into a project, you connect a suitable toolkit to perform operations on it.

See Core Concepts: abstractToolkit for the base class.

DataSource¶

A DataSource is a registered data entry within a toolkit. It represents external data (a file, URL, or Python object) along with all metadata needed for the toolkit to read and understand it. Each datasource has:

Name — A human-readable identifier (e.g., "YAVNEEL", "SRTMGL1")
Resource — Path to the actual data file
Data Format — How to read the data (e.g., "parquet", "geopandas")
Version — A (major, minor, patch) tuple for version management

# Register a datasource
toolkit.addDataSource("YAVNEEL", "/data/YAVNEEL.parquet", "parquet", version=[0, 0, 1])

# Retrieve data
df = toolkit.getDataSourceData("YAVNEEL")

Repository¶

A Repository is a JSON file that declares a collection of datasources, configurations, and documents organized by toolkit name. It serves as a blueprint for populating a project with data.

When a repository is added to a project, its contents are automatically loaded, registering all declared datasources, setting configurations, and creating the necessary MongoDB documents.

{
    "MeteoLowFreq": {
        "Config": { "stationType": "IMS" },
        "Datasource": {
            "YAVNEEL": {
                "isRelativePath": "True",
                "item": {
                    "resource": "measurements/meteorology/YAVNEEL.parquet",
                    "dataFormat": "parquet"
                }
            }
        }
    }
}

See Data Layer: Repository JSON for the full format.

Document¶

A Document is a MongoDB record that represents a piece of data in Hera. Every document has:

Field	Description
`projectName`	The project it belongs to
`_cls`	Type discriminator: `Metadata.Measurements`, `Metadata.Simulations`, or `Metadata.Cache`
`type`	Application-defined type tag (e.g., `"ToolkitDataSource"`)
`resource`	Path to the actual data file or inline value
`dataFormat`	How to interpret the resource (e.g., `"parquet"`, `"JSON_dict"`)
`desc`	Free-form metadata dictionary

Documents are the fundamental unit of data organization in Hera.

Collection¶

A Collection is a MongoDB collection that stores documents of a particular type. Hera uses three collections:

Collection	Class	Purpose
Measurements	`Measurements_Collection`	Observational data, toolkit datasources
Simulations	`Simulations_Collection`	Simulation model outputs
Cache	`Cache_Collection`	Intermediate results, configurations

Each collection provides addDocument(), getDocuments(), and deleteDocuments() methods.

Config¶

Config is a per-project key-value store for settings and parameters. It is stored as a special Cache document with type = "<projectName>__config__". Toolkits use config to store defaults (e.g., default datasource name, default CRS).

# Set configuration
proj.setConfig(defaultSRTM="SRTMGL1", defaultCRS=4326)

# Get configuration
config = proj.getConfig()
print(config["defaultSRTM"])  # "SRTMGL1"

Counter¶

A Counter is an atomic integer stored within the project config, used for generating sequential IDs. Counters are thread-safe and support atomic read-and-increment operations.

# Define a counter
proj.setCounter("experimentID", defaultValue=0)

# Atomically get and increment
current_id = proj.getCounterAndAdd("experimentID", addition=1)

Version¶

A Version is a three-element tuple [major, minor, patch] used to manage multiple versions of the same datasource. The system supports:

Explicit versioning — Request a specific version: getDataSourceData("YAVNEEL", version=[0, 0, 2])
Default version — Set a default for a datasource: setDataSourceDefaultVersion("YAVNEEL", [0, 0, 2])
Latest version — If no version is specified and no default is set, the highest version is returned

ToolkitHome¶

ToolkitHome is the singleton registry that manages all available toolkits. It maintains a static dictionary of built-in toolkits and supports dynamic registration of custom toolkits at runtime. Access it via:

from hera import toolkitHome

# Get a toolkit instance
tk = toolkitHome.getToolkit(toolkitHome.METEOROLOGY_LOWFREQ, projectName="MY_PROJECT")

See Core Concepts: ToolkitHome for technical details.

dataToolkit¶

The dataToolkit is a special toolkit responsible for repository management. It operates on the defaultProject and handles:

Registering repository JSON files
Loading all datasources from a repository into a project
Resolving relative paths in repository JSONs

See Data Layer: Repository Pipeline for details.

datatypes¶

The datatypes class defines all supported data format constants and the dispatch logic for reading/writing data. Each format constant (e.g., PARQUET, NETCDF_XARRAY, GEOPANDAS) maps to a specific reader/writer implementation.

See Data Layer: datatypes for the complete format table.

Analysis Layer¶

The Analysis Layer is a property of each toolkit (toolkit.analysis) that provides domain-specific data processing methods. For example:

lowFreqToolKit.analysis.addDatesColumns() — Add temporal columns to meteorological data
TopographyToolkit.analysis.calculateStatistics() — Compute elevation statistics

Presentation Layer¶

The Presentation Layer is a property of each toolkit (toolkit.presentation) that provides visualization and plotting methods. For example:

lowFreqToolKit.presentation.dailyPlots.plotScatter() — Scatter plot of daily data
lowFreqToolKit.presentation.seasonalPlots.plotProbContourf_bySeason() — Seasonal probability contours

Expected Output¶

An Expected Output is a serialized file containing the known-correct result of a test. These files are organized into result sets (named directories) and are used by the comparison helpers to validate test results.

See Testing Flow: Expected Output Management for details.

Result Set¶

A Result Set is a named collection of expected output files, stored in a directory under expected/. The default result set is called BASELINE. Alternative result sets can be created for regression testing.