Glossary¶
Key terms and concepts used throughout the Hera system.
Project¶
A Project is the central workspace in Hera. It represents a named container that groups all data (measurements, simulations, cached results) and configurations together. Every interaction with data goes through a Project instance, which manages three MongoDB collections and a local files directory.
See Core Concepts: Project for technical details.
Toolkit¶
A Toolkit is a domain-specific module that extends the Project class with specialized functionality for a particular type of data or analysis. Examples include TopographyToolkit for elevation data, lowFreqToolKit for meteorological station data, and OFToolkit for OpenFOAM simulations.
Each toolkit provides:
- Data access — Managing datasources of its domain
- Analysis — Processing and computation methods
- Presentation — Visualization and plotting capabilities
Once data is loaded into a project, you connect a suitable toolkit to perform operations on it.
See Core Concepts: abstractToolkit for the base class.
DataSource¶
A DataSource is a registered data entry within a toolkit. It represents external data (a file, URL, or Python object) along with all metadata needed for the toolkit to read and understand it. Each datasource has:
- Name — A human-readable identifier (e.g.,
"YAVNEEL","SRTMGL1") - Resource — Path to the actual data file
- Data Format — How to read the data (e.g.,
"parquet","geopandas") - Version — A
(major, minor, patch)tuple for version management
# Register a datasource
toolkit.addDataSource("YAVNEEL", "/data/YAVNEEL.parquet", "parquet", version=[0, 0, 1])
# Retrieve data
df = toolkit.getDataSourceData("YAVNEEL")
Repository¶
A Repository is a JSON file that declares a collection of datasources, configurations, and documents organized by toolkit name. It serves as a blueprint for populating a project with data.
When a repository is added to a project, its contents are automatically loaded, registering all declared datasources, setting configurations, and creating the necessary MongoDB documents.
{
"MeteoLowFreq": {
"Config": { "stationType": "IMS" },
"Datasource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/YAVNEEL.parquet",
"dataFormat": "parquet"
}
}
}
}
}
See Data Layer: Repository JSON for the full format.
Document¶
A Document is a MongoDB record that represents a piece of data in Hera. Every document has:
| Field | Description |
|---|---|
projectName |
The project it belongs to |
_cls |
Type discriminator: Metadata.Measurements, Metadata.Simulations, or Metadata.Cache |
type |
Application-defined type tag (e.g., "ToolkitDataSource") |
resource |
Path to the actual data file or inline value |
dataFormat |
How to interpret the resource (e.g., "parquet", "JSON_dict") |
desc |
Free-form metadata dictionary |
Documents are the fundamental unit of data organization in Hera.
Collection¶
A Collection is a MongoDB collection that stores documents of a particular type. Hera uses three collections:
| Collection | Class | Purpose |
|---|---|---|
| Measurements | Measurements_Collection |
Observational data, toolkit datasources |
| Simulations | Simulations_Collection |
Simulation model outputs |
| Cache | Cache_Collection |
Intermediate results, configurations |
Each collection provides addDocument(), getDocuments(), and deleteDocuments() methods.
Config¶
Config is a per-project key-value store for settings and parameters. It is stored as a special Cache document with type = "<projectName>__config__". Toolkits use config to store defaults (e.g., default datasource name, default CRS).
# Set configuration
proj.setConfig(defaultSRTM="SRTMGL1", defaultCRS=4326)
# Get configuration
config = proj.getConfig()
print(config["defaultSRTM"]) # "SRTMGL1"
Counter¶
A Counter is an atomic integer stored within the project config, used for generating sequential IDs. Counters are thread-safe and support atomic read-and-increment operations.
# Define a counter
proj.setCounter("experimentID", defaultValue=0)
# Atomically get and increment
current_id = proj.getCounterAndAdd("experimentID", addition=1)
Version¶
A Version is a three-element tuple [major, minor, patch] used to manage multiple versions of the same datasource. The system supports:
- Explicit versioning — Request a specific version:
getDataSourceData("YAVNEEL", version=[0, 0, 2]) - Default version — Set a default for a datasource:
setDataSourceDefaultVersion("YAVNEEL", [0, 0, 2]) - Latest version — If no version is specified and no default is set, the highest version is returned
ToolkitHome¶
ToolkitHome is the singleton registry that manages all available toolkits. It maintains a static dictionary of built-in toolkits and supports dynamic registration of custom toolkits at runtime. Access it via:
from hera import toolkitHome
# Get a toolkit instance
tk = toolkitHome.getToolkit(toolkitHome.METEOROLOGY_LOWFREQ, projectName="MY_PROJECT")
See Core Concepts: ToolkitHome for technical details.
dataToolkit¶
The dataToolkit is a special toolkit responsible for repository management. It operates on the defaultProject and handles:
- Registering repository JSON files
- Loading all datasources from a repository into a project
- Resolving relative paths in repository JSONs
See Data Layer: Repository Pipeline for details.
datatypes¶
The datatypes class defines all supported data format constants and the dispatch logic for reading/writing data. Each format constant (e.g., PARQUET, NETCDF_XARRAY, GEOPANDAS) maps to a specific reader/writer implementation.
See Data Layer: datatypes for the complete format table.
Analysis Layer¶
The Analysis Layer is a property of each toolkit (toolkit.analysis) that provides domain-specific data processing methods. For example:
lowFreqToolKit.analysis.addDatesColumns()— Add temporal columns to meteorological dataTopographyToolkit.analysis.calculateStatistics()— Compute elevation statistics
Presentation Layer¶
The Presentation Layer is a property of each toolkit (toolkit.presentation) that provides visualization and plotting methods. For example:
lowFreqToolKit.presentation.dailyPlots.plotScatter()— Scatter plot of daily datalowFreqToolKit.presentation.seasonalPlots.plotProbContourf_bySeason()— Seasonal probability contours
Expected Output¶
An Expected Output is a serialized file containing the known-correct result of a test. These files are organized into result sets (named directories) and are used by the comparison helpers to validate test results.
See Testing Flow: Expected Output Management for details.
Result Set¶
A Result Set is a named collection of expected output files, stored in a directory under expected/. The default result set is called BASELINE. Alternative result sets can be created for regression testing.