Repository JSON Schema Reference¶
Complete schema documentation for repository JSON files.
Top-Level Structure¶
A repository JSON is a dictionary mapping toolkit names to their configuration and data sections:
{
"<ToolkitName>": {
"Config": { ... },
"DataSource": { ... },
"Measurements": { ... },
"Simulations": { ... },
"Cache": { ... },
"Function": { ... }
},
"<AnotherToolkitName>": { ... }
}
Rules: - Toolkit names must match registered toolkit names (case-sensitive) - Each toolkit section is independent - Section order doesn't matter - Sections are optional (only include what you need)
Section Types¶
Config Section¶
Purpose: Set toolkit configuration via toolkit.setConfig(**configDict)
Schema:
{
"<ToolkitName>": {
"Config": {
"<key1>": "<value1>",
"<key2>": 42,
"<key3>": {
"nested": "config"
}
}
}
}
Handler: _handle_Config → Calls toolkit.setConfig(**configDict)
Example:
DataSource Section¶
Purpose: Register versioned datasources accessible via toolkit.getDataSourceData()
Schema:
{
"<ToolkitName>": {
"DataSource": {
"<datasourceName>": {
"isRelativePath": "True" | "False" | true | false,
"item": {
"resource": "<path or value>",
"dataFormat": "<format constant>",
"version": [<major>, <minor>, <patch>],
"desc": {
"<key>": "<value>",
...
},
"overwrite": true | false // Optional, default: false
}
}
}
}
}
Handler: _handle_DataSource → Calls toolkit.addDataSource(...)
Required Fields:
- resource — Path to data file or inline value
- dataFormat — One of the datatypes constants (e.g., "parquet", "geopandas")
Optional Fields:
- version — Version tuple [major, minor, patch] (default: [0, 0, 1])
- desc — Metadata dictionary (default: {})
- overwrite — Overwrite existing datasource (default: false)
Path Resolution:
- If isRelativePath is "True" or true: resource is resolved relative to repository JSON directory
- If isRelativePath is "False" or false: resource is used as absolute path
Example:
{
"MeteoLowFreq": {
"DataSource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/YAVNEEL.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "YAVNEEL",
"latitude": 31.7683,
"longitude": 35.2137
}
}
}
}
}
}
Measurements Section¶
Purpose: Add raw measurement documents to the Measurements collection
Schema:
{
"<ToolkitName>": {
"Measurements": {
"<documentName>": {
"isRelativePath": "True" | "False" | true | false,
"item": {
"resource": "<path or value>",
"dataFormat": "<format constant>",
"type": "<document type>",
"desc": {
"<key>": "<value>",
...
}
}
}
}
}
}
Handler: _DocumentHandler → Calls toolkit.addMeasurementsDocument(...)
Required Fields:
- resource — Path to data file or inline value
- dataFormat — Format constant
- type — Application-defined document type (e.g., "Experiment_rawData")
Differences from DataSource:
- No version field (documents are not versioned)
- Requires type field
- Creates a Measurements collection document (not a ToolkitDataSource)
Example:
{
"MeteoLowFreq": {
"Measurements": {
"raw_export_2024": {
"isRelativePath": "True",
"item": {
"resource": "exports/raw_2024.parquet",
"dataFormat": "parquet",
"type": "Experiment_rawData",
"desc": {
"exportDate": "2024-12-01",
"source": "IMS"
}
}
}
}
}
}
Simulations Section¶
Purpose: Add simulation output documents to the Simulations collection
Schema:
{
"<ToolkitName>": {
"Simulations": {
"<documentName>": {
"isRelativePath": "True" | "False" | true | false,
"item": {
"resource": "<path or value>",
"dataFormat": "<format constant>",
"type": "<document type>",
"desc": {
"<key>": "<value>",
...
}
}
}
}
}
}
Handler: _DocumentHandler → Calls toolkit.addSimulationsDocument(...)
Same structure as Measurements, but creates documents in the Simulations collection.
Example:
{
"OpenFOAM": {
"Simulations": {
"wind_simulation_001": {
"isRelativePath": "True",
"item": {
"resource": "simulations/wind_001.nc",
"dataFormat": "netcdf_xarray",
"type": "WindProfile",
"desc": {
"simulationDate": "2024-11-15",
"solver": "simpleFoam"
}
}
}
}
}
}
Cache Section¶
Purpose: Add cached/computed documents to the Cache collection
Schema:
{
"<ToolkitName>": {
"Cache": {
"<documentName>": {
"isRelativePath": "True" | "False" | true | false,
"item": {
"resource": "<path or value>",
"dataFormat": "<format constant>",
"type": "<document type>",
"desc": {
"<key>": "<value>",
...
}
}
}
}
}
}
Handler: _DocumentHandler → Calls toolkit.addCacheDocument(...)
Same structure as Measurements/Simulations, but creates documents in the Cache collection.
Example:
{
"MeteoLowFreq": {
"Cache": {
"processed_stats": {
"isRelativePath": "True",
"item": {
"resource": "cache/statistics.json",
"dataFormat": "JSON_dict",
"type": "ProcessedStatistics",
"desc": {
"computedDate": "2024-11-20",
"method": "hourly_distribution"
}
}
}
}
}
}
Function Section¶
Purpose: Call named functions on the dataToolkit instance
Schema:
{
"<ToolkitName>": {
"Function": {
"<functionName>": {
"params": {
"<param1>": "<value1>",
"<param2>": 42,
...
}
}
}
}
}
Or for multiple calls:
{
"<ToolkitName>": {
"Function": {
"<functionName>": [
{
"params": { ... }
},
{
"params": { ... }
}
]
}
}
}
Handler: _handle_Function → Calls dataToolkit.<functionName>(**params, overwrite=overwrite)
Requirements:
- Function must exist on dataToolkit instance
- Function signature must accept overwrite parameter
- params can be a dict (single call) or list of dicts (multiple calls)
Example:
{
"MeteoLowFreq": {
"Function": {
"initializeToolkit": {
"params": {
"autoLoadDefaults": true
}
}
}
}
}
Path Resolution Rules¶
How basedir is Determined¶
The basedir is the directory containing the repository JSON file:
Example:
- Repository JSON at: /home/user/repos/my_repo.json
- basedir = /home/user/repos/
Relative Path Resolution¶
For each item with isRelativePath: "True" or isRelativePath: true:
if isRelativePath:
absolute_path = os.path.join(basedir, resource)
else:
absolute_path = resource # Used as-is
Example:
- Repository JSON: /home/user/repos/my_repo.json
- resource: "data/file.parquet"
- isRelativePath: "True"
- Resolved to: /home/user/repos/data/file.parquet
Absolute Path Handling¶
For items with isRelativePath: "False" or isRelativePath: false:
- Path is used exactly as specified
- No modification is performed
- Useful for shared network drives or fixed locations
Validation Rules¶
Toolkit Validation¶
- Toolkit must exist — Name must match a registered toolkit
- Auto-registration — If
auto_register_missing=True, attempts to register from JSON hints or DB documents
Section Validation¶
- Valid section names — Must be one of:
Config,DataSource,Measurements,Simulations,Cache,Function - Case-sensitive — Section names are title-cased internally
DataSource Item Validation¶
- Required fields:
resource,dataFormat - Version format: Must be a list of 3 integers:
[major, minor, patch] - isRelativePath: Must be
"True","False",true, orfalse
Document Item Validation¶
- Required fields:
resource,dataFormat,type - Type field: Must be a non-empty string
Path Validation¶
- Relative paths — Directory must exist when resolved
- Absolute paths — File/directory must exist (checked during
getData())
Complete Schema Diagram¶
dSimulationsDocument(...)\nOR\naddCacheDocument(...)"]
FuncSection --> FuncAction["dataToolkit.
Common Errors¶
| Error | Cause | Solution |
|---|---|---|
Unknown Handler X |
Section name doesn't match expected handlers | Use one of: Config, DataSource, Measurements, Simulations, Cache, Function |
Toolkit X not found |
Toolkit not registered | Register toolkit or enable auto_register_missing |
Source X already exists |
Datasource exists and overwrite=False |
Set overwrite: true in item or use --overwrite flag |
isRelativePath must be defined |
Missing or invalid isRelativePath |
Set to "True", "False", true, or false |
resource path not found |
Resolved path doesn't exist | Check path, ensure isRelativePath is correct |
Implementation Details¶
The repository loading is implemented in hera/utils/data/toolkit.py:
loadAllDatasourcesInRepositoryJSONToProject()— Main entry point_handle_Config()— Config section handler_handle_DataSource()— DataSource section handler_DocumentHandler()— Measurements/Simulations/Cache handler_handle_Function()— Function section handler_makeItemPathAbsolute()— Path resolution logic
See Also¶
- Repository Examples — Complete working examples
- Data Layer: Repository Pipeline — How repositories are processed
- Best Practices: Repository Structure — Organization guidelines