Datasource¶
A datasource is an external data that is needed for the toolkit.
The data is usually loaded by some repository of data. Note, that the loaded data is specific for a project. Hence, it is needed to load it seperately for each project.
Internally, datasource is implemented as a data item saved as a measurement document. The type and the resource is determined by the user that added them.
First, Let's add a toolkit. for this example we use GIS_DEMOGRAPHY toolkit.
from hera import toolkitHome
toolkitName = toolkitHome.GIS_DEMOGRAPHY
projectName = "The-Project-Name"
toolkit_specific_parameters = dict() # empty for this presentation.
tk = toolkitHome.getToolkit(toolkitName=toolkitName,
projectName=projectName,
**toolkit_specific_parameters)
Adding a datasource¶
Adding a datasource to a toolkit is performed by addDataSource property.
The parameters are:
dataSourceName : str
The name of the data sourceresource : str
The path to the data file of the datasource.dataFormat : str
The format of the data.version: tuple, default (0,0,1)
The version of the datasouce. This allows you to add different version of the datasource and access the currect one.overwrite: bool, default: False
If True, overwrite the existing datasource (of the input version). If False, raise an exception if the datasource exists (with the input version).Additional parameters:
additional parameters name and their values for the datasource.
tk.addDataSource(dataSourceName="thedata",
resource="path-to-data",
dataFormat=tk.datatypes.STRING,
version=(0,0,1),overwrite=True)
<Measurements: {
"_cls": "Metadata.Measurements",
"projectName": "The-Project-Name",
"desc": {
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
1
]
},
"type": "ToolkitDataSource",
"resource": "path-to-data",
"dataFormat": "string"
}>
Adding another version of the datasource
tk.addDataSource(dataSourceName="thedata",
resource="path-to-data-2",
dataFormat=tk.datatypes.STRING,
overwrite=True,
version=(0,0,2),key="value")
<Measurements: {
"_cls": "Metadata.Measurements",
"projectName": "The-Project-Name",
"desc": {
"key": "value",
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
2
]
},
"type": "ToolkitDataSource",
"resource": "path-to-data-2",
"dataFormat": "string"
}>
Listing the datasources¶
List the datasources that were added to the project is performed by
tk.getDataSourceTable()
| dataFormat | resource | toolkit | datasourceName | version | key | |
|---|---|---|---|---|---|---|
| 0 | string | path-to-data | Demography | thedata | [0, 0, 1] | NaN |
| 1 | string | path-to-data-2 | Demography | thedata | [0, 0, 2] | value |
It is possible to filter the datasources using the key/value
tk.getDataSourceTable(key="value")
| dataFormat | resource | key | toolkit | datasourceName | version | |
|---|---|---|---|---|---|---|
| 0 | string | path-to-data-2 | value | Demography | thedata | [0, 0, 2] |
Alternativle, it is possible to get the datasource as a list of dictionaryes.
tk.getDataSourceMap()
[{'dataFormat': 'string',
'resource': 'path-to-data',
'toolkit': 'Demography',
'datasourceName': 'thedata',
'version': [0, 0, 1]},
{'dataFormat': 'string',
'resource': 'path-to-data-2',
'key': 'value',
'toolkit': 'Demography',
'datasourceName': 'thedata',
'version': [0, 0, 2]}]
Getting the datasource¶
It is possible to retrieve either the metadata document (datasource document) or the data itself.
To get the data document we use
import json
datasourceName = "thedata"
doc = tk.getDataSourceDocument(datasourceName=datasourceName)
print(json.dumps(doc.desc,indent=4))
{
"key": "value",
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
2
]
}
If the version is not specified, the function will return the highest version.
It is possible to set the version to get the relevant datasource
doc = tk.getDataSourceDocument(datasourceName=datasourceName,version=(0,0,1))
print(json.dumps(doc.desc,indent=4))
{
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
1
]
}
Getting the data is possible by using the getData
doc.getData()
'path-to-data'
It is possible to get a list of all the datasources
tk.getDataSourceDocumentsList()
[<Measurements: {
"_cls": "Metadata.Measurements",
"projectName": "The-Project-Name",
"desc": {
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
1
]
},
"type": "ToolkitDataSource",
"resource": "path-to-data",
"dataFormat": "string"
}>, <Measurements: {
"_cls": "Metadata.Measurements",
"projectName": "The-Project-Name",
"desc": {
"key": "value",
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
2
]
},
"type": "ToolkitDataSource",
"resource": "path-to-data-2",
"dataFormat": "string"
}>]
It is also possible to get the datasource data directly
tk.getDataSourceData(datasourceName=datasourceName)
'path-to-data-2'
Delete datasource¶
Deleting the datasource is achieved by
tk.deleteDataSource(datasourceName=datasourceName,version=(0,0,1))
<Measurements: {
"_cls": "Metadata.Measurements",
"projectName": "The-Project-Name",
"desc": {
"toolkit": "Demography",
"datasourceName": "thedata",
"version": [
0,
0,
1
]
},
"type": "ToolkitDataSource",
"resource": "path-to-data",
"dataFormat": "string"
}>
tk.getDataSourceTable()
| dataFormat | resource | key | toolkit | datasourceName | version | |
|---|---|---|---|---|---|---|
| 0 | string | path-to-data-2 | value | Demography | thedata | [0, 0, 2] |