Repository JSON Examples¶
Complete, annotated examples of repository JSON files for different use cases.
Minimal Example¶
A single toolkit with one datasource:
{
"MeteoLowFreq": {
"DataSource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/YAVNEEL.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "YAVNEEL",
"type": "lowfreq"
}
}
}
}
}
}
Explanation:
- MeteoLowFreq — Toolkit name (must match registered toolkit)
- DataSource — Section type (creates a ToolkitDataSource document)
- YAVNEEL — Datasource name (used in getDataSourceData("YAVNEEL"))
- isRelativePath: "True" — Path is relative to repository JSON file location
- resource — Path to the data file
- dataFormat: "parquet" — How to read the file
- version: [0, 0, 1] — Version tuple (major, minor, patch)
- desc — Free-form metadata dictionary
Multi-Toolkit Example¶
Multiple toolkits with different section types:
{
"GIS_Raster_Topography": {
"Config": {
"defaultSRTM": "SRTMGL1"
},
"DataSource": {
"SRTMGL1": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/raster/",
"dataFormat": "string",
"version": [0, 0, 1],
"desc": {
"defaultSRTM": "SRTMGL1"
}
}
}
}
},
"MeteoLowFreq": {
"DataSource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/YAVNEEL.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "YAVNEEL",
"type": "lowfreq"
}
}
}
}
},
"GIS_Demography": {
"DataSource": {
"lamas_population": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/vector/population_lamas.shp",
"dataFormat": "geopandas",
"version": [0, 0, 1],
"desc": {
"source": "LAMAS",
"year": 2020
}
}
}
}
}
}
Key Points:
- Each toolkit section is independent
- Config section sets toolkit configuration (via toolkit.setConfig())
- Different data formats: string, parquet, geopandas
- All paths are relative (isRelativePath: "True")
Versioned Datasources¶
Multiple versions of the same datasource:
{
"MeteoLowFreq": {
"DataSource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/YAVNEEL_v1.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "YAVNEEL",
"processedDate": "2024-01-15"
}
}
},
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/YAVNEEL_v2.parquet",
"dataFormat": "parquet",
"version": [0, 0, 2],
"desc": {
"stationName": "YAVNEEL",
"processedDate": "2024-03-20",
"qualityControl": "passed"
}
}
}
}
}
}
Note: Both entries have the same datasource name ("YAVNEEL") but different versions. The loader will create two separate documents. Use setDataSourceDefaultVersion() to choose which version is returned by default.
All Section Types¶
Complete example showing all supported section types:
{
"MeteoLowFreq": {
"Config": {
"defaultStation": "YAVNEEL",
"timezone": "Asia/Jerusalem"
},
"DataSource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/YAVNEEL.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "YAVNEEL"
}
}
}
},
"Measurements": {
"raw_export_2024": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/exports/raw_2024.parquet",
"dataFormat": "parquet",
"type": "Experiment_rawData",
"desc": {
"exportDate": "2024-12-01",
"source": "IMS"
}
}
}
},
"Simulations": {
"wind_simulation_001": {
"isRelativePath": "True",
"item": {
"resource": "simulations/wind/wind_001.nc",
"dataFormat": "netcdf_xarray",
"type": "WindProfile",
"desc": {
"simulationDate": "2024-11-15",
"solver": "simpleFoam"
}
}
}
},
"Cache": {
"processed_stats": {
"isRelativePath": "True",
"item": {
"resource": "cache/statistics.json",
"dataFormat": "JSON_dict",
"type": "ProcessedStatistics",
"desc": {
"computedDate": "2024-11-20",
"method": "hourly_distribution"
}
}
}
},
"Function": {
"initializeToolkit": {
"params": {
"autoLoadDefaults": true
}
}
}
}
}
Section Types Explained:
| Section | Handler | Action |
|---|---|---|
Config |
_handle_Config |
Calls toolkit.setConfig(**values) |
DataSource |
_handle_DataSource |
Calls toolkit.addDataSource(...) |
Measurements |
_DocumentHandler |
Calls toolkit.addMeasurementsDocument(...) |
Simulations |
_DocumentHandler |
Calls toolkit.addSimulationsDocument(...) |
Cache |
_DocumentHandler |
Calls toolkit.addCacheDocument(...) |
Function |
_handle_Function |
Calls a named function with parameters |
Relative vs Absolute Paths¶
Example showing both path resolution patterns:
{
"GIS_Raster_Topography": {
"DataSource": {
"SRTMGL1_relative": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/raster/",
"dataFormat": "string",
"version": [0, 0, 1],
"desc": {}
}
},
"SRTMGL1_absolute": {
"isRelativePath": "False",
"item": {
"resource": "/data/shared/GIS/SRTM/",
"dataFormat": "string",
"version": [0, 0, 1],
"desc": {}
}
}
}
}
}
Path Resolution:
If the repository JSON is at /home/user/repos/my_repo.json:
- Relative path (
isRelativePath: "True"): resource: "measurements/GIS/raster/"-
Resolved to:
/home/user/repos/measurements/GIS/raster/ -
Absolute path (
isRelativePath: "False"): resource: "/data/shared/GIS/SRTM/"- Used as-is:
/data/shared/GIS/SRTM/
Best Practice
Use relative paths (isRelativePath: "True") when the repository JSON and data files are in the same directory tree. This makes the repository portable and easier to share.
Real-World Example: Complete Project Setup¶
A realistic repository for a meteorological analysis project:
{
"GIS_Raster_Topography": {
"Config": {
"defaultSRTM": "SRTMGL1",
"defaultCRS": 4326
},
"DataSource": {
"SRTMGL1": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/raster/SRTM/",
"dataFormat": "string",
"version": [0, 0, 1],
"desc": {
"resolution": "30m",
"source": "NASA"
}
}
}
}
},
"GIS_LandCover": {
"Config": {
"defaultLandCover": "lc_mcd12q1"
},
"DataSource": {
"lc_mcd12q1": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/raster/lc_mcd12q1.tif",
"dataFormat": "string",
"version": [0, 0, 1],
"desc": {
"year": 2020,
"source": "MODIS"
}
}
}
}
},
"MeteoLowFreq": {
"DataSource": {
"YAVNEEL": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/YAVNEEL.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "YAVNEEL",
"type": "lowfreq",
"latitude": 31.7683,
"longitude": 35.2137
}
}
},
"TEL_AVIV": {
"isRelativePath": "True",
"item": {
"resource": "measurements/meteorology/lowfreqdata/TEL_AVIV.parquet",
"dataFormat": "parquet",
"version": [0, 0, 1],
"desc": {
"stationName": "TEL_AVIV",
"type": "lowfreq",
"latitude": 32.0853,
"longitude": 34.7818
}
}
}
}
},
"GIS_Demography": {
"DataSource": {
"lamas_population": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/vector/population_lamas.shp",
"dataFormat": "geopandas",
"version": [0, 0, 1],
"desc": {
"source": "LAMAS",
"year": 2020,
"crs": 2039
}
}
}
}
}
}
Usage:
# Register the repository
hera-project repository add meteo_project /path/to/repository.json
# Load into a project
hera-project repository load meteo_project MY_PROJECT --overwrite
Or via Python:
from hera.utils.data.toolkit import dataToolkit
import json
with open("repository.json") as f:
repo_json = json.load(f)
dt = dataToolkit()
dt.loadAllDatasourcesInRepositoryJSONToProject(
projectName="MY_PROJECT",
repositoryJSON=repo_json,
basedir="/path/to/repo/dir",
overwrite=True
)
Common Patterns¶
Pattern 1: Multiple Data Formats¶
{
"MyToolkit": {
"DataSource": {
"csv_data": {
"isRelativePath": "True",
"item": {
"resource": "data/input.csv",
"dataFormat": "csv_pandas",
"version": [0, 0, 1],
"desc": {}
}
},
"netcdf_data": {
"isRelativePath": "True",
"item": {
"resource": "data/output.nc",
"dataFormat": "netcdf_xarray",
"version": [0, 0, 1],
"desc": {}
}
},
"geojson_data": {
"isRelativePath": "True",
"item": {
"resource": "data/boundaries.geojson",
"dataFormat": "JSON_geopandas",
"version": [0, 0, 1],
"desc": {}
}
}
}
}
}
Pattern 2: Inline Configuration¶
{
"MyToolkit": {
"Config": {
"setting1": "value1",
"setting2": 42,
"setting3": {
"nested": "config"
}
}
}
}
The Config section is passed directly to toolkit.setConfig(**configDict).
Pattern 3: String Resources (Directory Paths)¶
Some toolkits accept directory paths as strings:
{
"GIS_Raster_Topography": {
"DataSource": {
"SRTMGL1": {
"isRelativePath": "True",
"item": {
"resource": "measurements/GIS/raster/",
"dataFormat": "string",
"version": [0, 0, 1],
"desc": {}
}
}
}
}
}
The toolkit will search this directory for files (e.g., .hgt files for SRTM).
Validation and Error Handling¶
The repository loader validates:
- Toolkit existence — Toolkit must be registered or auto-registrable
- Section names — Must be one of:
Config,DataSource,Measurements,Simulations,Cache,Function - Required fields —
resource,dataFormatare required for DataSource items - Path resolution — Relative paths must be resolvable (directory must exist)
- Version format — Must be a list of 3 integers:
[major, minor, patch]
Common Errors:
Unknown Handler X— Section name doesn't match expected handlersToolkit X not found— Toolkit not registered and auto-registration failedSource X already exists— Datasource exists andoverwrite=False
See Also¶
- Repository JSON Schema Reference — Complete schema documentation
- Best Practices: Repository Structure — Organization guidelines
- Data Layer: Repository Pipeline — How repositories are loaded