Datasource¶
A datasource is an external data that is needed for the toolkit.
The data is usually loaded by some repository of data. Note, that the loaded data is specific for a project. Hence, it is needed to load it seperately for each project.
Internally, datasource is implemented as a data item saved as a measurement document. The type and the resource is determined by the user that added them.
First, Let's add a toolkit. for this example we use GIS_DEMOGRAPHY toolkit.
from hera import toolkitHome
toolkitName = toolkitHome.GIS_DEMOGRAPHY
projectName = "The-Project-Name"
toolkit_specific_parameters = dict() # empty for this presentation.
tk = toolkitHome.getToolkit(toolkitName=toolkitName,
projectName=projectName,
**toolkit_specific_parameters)
Adding a datasource¶
Adding a datasource to a toolkit is performed by addDataSource property.
The parameters are:
dataSourceName : str
The name of the data sourceresource : str
The path to the data file of the datasource.dataFormat : str
The format of the data.version: tuple, default (0,0,1)
The version of the datasouce. This allows you to add different version of the datasource and access the currect one.overwrite: bool, default: False
If True, overwrite the existing datasource (of the input version). If False, raise an exception if the datasource exists (with the input version).Additional parameters:
additional parameters name and their values for the datasource.
tk.addDataSource(dataSourceName="thedata",
resource="path-to-data",
dataFormat=tk.datatypes.STRING,
version=(0,0,1),overwrite=True)
Adding another version of the datasource
tk.addDataSource(dataSourceName="thedata",
resource="path-to-data-2",
dataFormat=tk.datatypes.STRING,
overwrite=True,
version=(0,0,2),key="value")
Listing the datasources¶
List the datasources that were added to the project is performed by
tk.getDataSourceTable()
It is possible to filter the datasources using the key/value
tk.getDataSourceTable(key="value")
Alternativle, it is possible to get the datasource as a list of dictionaryes.
tk.getDataSourceMap()
Getting the datasource¶
It is possible to retrieve either the metadata document (datasource document) or the data itself.
To get the data document we use
import json
datasourceName = "thedata"
doc = tk.getDataSourceDocument(datasourceName=datasourceName)
print(json.dumps(doc.desc,indent=4))
If the version is not specified, the function will return the highest version.
It is possible to set the version to get the relevant datasource
doc = tk.getDataSourceDocument(datasourceName=datasourceName,version=(0,0,1))
print(json.dumps(doc.desc,indent=4))
Getting the data is possible by using the getData
doc.getData()
It is possible to get a list of all the datasources
tk.getDataSourceDocumentsList()
It is also possible to get the datasource data directly
tk.getDataSourceData(datasourceName=datasourceName)
Delete datasource¶
Deleting the datasource is achieved by
tk.deleteDataSource(datasourceName=datasourceName,version=(0,0,1))
tk.getDataSourceTable()