Data Layer API¶
MongoDB-backed data storage: collections, documents, data handlers, and function caching.
Collections¶
AbstractCollection¶
hera.datalayer.collection.AbstractCollection
¶
Bases: object
Abstract collection that contains documents of a certain type
Source code in hera/datalayer/collection.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
type
property
¶
The collection type (e.g. 'Measurements', 'Simulations', 'Cache'), or None for all.
Returns:
| Type | Description |
|---|---|
str or None
|
|
__init__(ctype=None, connectionName=None)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ctype
|
str or None
|
Collection type name (e.g. 'Measurements'). None for all types. |
None
|
connectionName
|
str or None
|
Optional database connection alias. |
None
|
Source code in hera/datalayer/collection.py
getDocumentsAsDict(projectName, with_id=False, **query)
¶
Returns a dict with a 'documents' key and list of documents in a dict formats as value. The list of the documents are the result of your query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
projectName
|
str
|
The projectName. |
required |
with_id
|
bool
|
rather or not should the 'id' key be in the documents. |
False
|
query
|
query arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
dict
|
A dict with 'documents' key and the value is a list of dicts that represent the documents that fulfills the query. |
Source code in hera/datalayer/collection.py
getDocuments(projectName, resource=None, dataFormat=None, type=None, **desc)
¶
Get the documents that satisfy the given query. If projectName is None search over all projects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
projectName
|
str
|
The project name. |
required |
resource
|
The data resource. |
None
|
|
dataFormat
|
str
|
The data format. |
None
|
type
|
str
|
The type which the data belongs to. |
None
|
desc
|
Other metadata arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
list
|
List of documents that fulfill the query. |
Source code in hera/datalayer/collection.py
getProjectList()
¶
Returns the list of unique project names in this collection.
Returns:
| Type | Description |
|---|---|
list of str
|
|
getDocumentByID(id)
¶
Returns a document by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
The document ID. |
required |
Returns:
| Type | Description |
|---|---|
document
|
The document with the relevant ID. |
Source code in hera/datalayer/collection.py
addDocument(projectName, resource='', dataFormat='string', type='', desc={})
¶
Adds a document to the database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
projectName
|
str
|
The project to add the document |
required |
resource
|
The data of the document. |
''
|
|
dataFormat
|
str
|
The type of the dataformat. See datahandler for the available types. |
'string'
|
desc
|
dict
|
Holds any additional fields that describe the |
{}
|
type
|
str
|
The type of the data |
''
|
Returns:
| Type | Description |
|---|---|
mongoengine document
|
|
Source code in hera/datalayer/collection.py
addDocumentFromJSON(json_data)
¶
Adds a document from a JSON string representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
json_data
|
str
|
A JSON string representing the document. |
required |
Source code in hera/datalayer/collection.py
deleteDocuments(projectName, **query)
¶
Deletes documents that satisfy the given query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
projectName
|
str
|
The project name. |
required |
query
|
Other query arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
list
|
|
dictionary with the data that was removed.
|
|
Source code in hera/datalayer/collection.py
deleteDocumentByID(id)
¶
Deletes a documents by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id
|
str
|
The document ID. |
required |
Returns:
| Type | Description |
|---|---|
dict.
|
|
The record that was deleted.
|
|
Source code in hera/datalayer/collection.py
Measurements_Collection¶
hera.datalayer.collection.Measurements_Collection
¶
Bases: AbstractCollection
Collection that contains measurement documents.
Source code in hera/datalayer/collection.py
__init__(connectionName=None)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connectionName
|
str or None
|
Optional database connection alias. |
None
|
Source code in hera/datalayer/collection.py
Simulations_Collection¶
hera.datalayer.collection.Simulations_Collection
¶
Bases: AbstractCollection
Abstract collection that contains documents of Simulations
Source code in hera/datalayer/collection.py
__init__(connectionName=None)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connectionName
|
str or None
|
Optional database connection alias. |
None
|
Source code in hera/datalayer/collection.py
Cache_Collection¶
hera.datalayer.collection.Cache_Collection
¶
Bases: AbstractCollection
Abstract collection that contains documents of Cache
Source code in hera/datalayer/collection.py
__init__(connectionName=None)
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connectionName
|
str or None
|
Optional database connection alias. |
None
|
Source code in hera/datalayer/collection.py
Data Handlers¶
datatypes¶
hera.datalayer.datahandler.datatypes
¶
Registry of supported data format constants and dispatch logic for data handlers.
Each constant (e.g. STRING, PARQUET, HDF) identifies a data format.
Use getHandler(formatName) to retrieve the corresponding DataHandler_* class,
or getDataFormatName(obj) to auto-detect the format from a Python object.
Source code in hera/datalayer/datahandler.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 | |
get_obj_or_instance_fullName(obj)
staticmethod
¶
Returns the fully qualified name of a class or instance, including its module.
Examples: >>> get_full_name(SomeClass) 'package.module.SomeClass'
>>> get_full_name(SomeClass())
'package.module.SomeClass'
Source code in hera/datalayer/datahandler.py
getDataFormatName(obj_or_class)
staticmethod
¶
Tries to find the datatype name in hera for the object.
if cannot found, use general object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_or_class
|
object or type.
|
|
required |
Returns:
| Type | Description |
|---|---|
A dict with
|
|
Source code in hera/datalayer/datahandler.py
getDataFormatExtension(obj_or_class)
staticmethod
¶
Tries to find the datatype name in hera for the object.
if cannot found, use general object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_or_class
|
object or type.
|
|
required |
Returns:
| Type | Description |
|---|---|
A dict with
|
|
Source code in hera/datalayer/datahandler.py
guessHandler(obj_or_class)
staticmethod
¶
Auto-detect the data format and return the appropriate handler class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj_or_class
|
object or type
|
The data object or class to detect the format for. |
required |
Returns:
| Type | Description |
|---|---|
DataHandler class
|
The handler class for the detected format. |
Source code in hera/datalayer/datahandler.py
getHandler(objectType)
staticmethod
¶
Return the DataHandler class for the given data format name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
objectType
|
str
|
A data format name (e.g. |
required |
Returns:
| Type | Description |
|---|---|
DataHandler class
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If no handler exists for the given type. |
Source code in hera/datalayer/datahandler.py
Documents¶
MetadataFrame¶
hera.datalayer.document.metadataDocument.MetadataFrame
¶
Bases: object
A basic structure for a document.
Each document is related to a project and described by the following fields:
-
type : str : The type of the document. This is an helper attribute that is used to query the data.
-
resource: str: The resource that the document represents. This can be either path to a file on the disk or the data itself.
-
dataFormat : str: The format of the data. Taken from ::class:
..datatypes.datatypes -
desc: dict: A dictionary of arbitrary format that holds the metadata of the record.
-
id : str : The id of the record in the DB.
Source code in hera/datalayer/document/metadataDocument.py
asDict(with_id=False)
¶
Convert the document to a plain dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
with_id
|
bool
|
If True, include the |
False
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary representation of the document. |
Source code in hera/datalayer/document/metadataDocument.py
getData(**kwargs)
¶
Returns the data of the document.
the kwargs passed to the datahandler. See the datahandler class for your specific datatype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs
|
dict
|
|
{}
|
Returns:
| Type | Description |
|---|---|
object according to the datahandler.
|
|
Source code in hera/datalayer/document/metadataDocument.py
nonDBMetadataFrame¶
hera.datalayer.document.metadataDocument.nonDBMetadataFrame
¶
Bases: object
A wrapper class to use when the data is not loaded into the DB.
This class will be used when getting data from local files.
Source code in hera/datalayer/document/metadataDocument.py
__init__(data, projectName=None, type=None, resource=None, dataFormat=None, **desc)
¶
Initialize a non-database metadata frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
object
|
The data to wrap. |
required |
projectName
|
str
|
The project name. |
None
|
type
|
str
|
The document type. |
None
|
resource
|
str
|
The resource path or identifier. |
None
|
dataFormat
|
str
|
The data format name. |
None
|
desc
|
dict
|
Additional metadata fields. |
{}
|
Source code in hera/datalayer/document/metadataDocument.py
getData(**kwargs)
¶
Return the wrapped data object.
Returns:
| Type | Description |
|---|---|
object
|
The data passed at initialization. |
__getitem__(item)
¶
Access document attributes by key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
str
|
The attribute name. |
required |
Returns:
| Type | Description |
|---|---|
object
|
|
Caching¶
hera.datalayer.autocache
¶
cacheDecorators
¶
Internal implementation of the function caching mechanism.
Wraps a function call, serializes its arguments, checks the database for a cached result, and stores the result if not already cached.
Source code in hera/datalayer/autocache.py
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
is_mongo_serializable(obj)
staticmethod
¶
Check whether obj can be BSON-encoded for MongoDB storage.
Source code in hera/datalayer/autocache.py
obj_to_txt(obj)
staticmethod
¶
Serialize obj to a base64-encoded text string via pickle.
txt_to_obj(txt)
staticmethod
¶
Deserialize an object from a base64-encoded text string.
__init__(func, dataFormat, projectName=None, postProcessFunction=None, getDataParams={}, storeDataParams={})
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
callable
|
The function whose results are cached. |
required |
dataFormat
|
str or None
|
Storage format for the cached data. |
required |
projectName
|
str or None
|
Project that owns the cache collection. |
None
|
postProcessFunction
|
callable or None
|
Optional transform applied to the result before returning. |
None
|
getDataParams
|
dict
|
Extra keyword arguments forwarded to |
{}
|
storeDataParams
|
dict
|
Extra keyword arguments forwarded when saving data. |
{}
|
Source code in hera/datalayer/autocache.py
__call__(*args, **kwargs)
¶
Execute the function, returning a cached result when available.
Source code in hera/datalayer/autocache.py
checkIfFunctionIsCached(call_info)
¶
Check if the function and the parameters are stored in the DB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
call_info
|
dict
|
A dict with the info on the function that was called. functionName and functionParameters as parameters. |
required |
Returns:
| Type | Description |
|---|---|
None if the data does not exist,
|
the data otherwise. |
Source code in hera/datalayer/autocache.py
saveFunctionCache(call_info, data)
¶
Save the data to the disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
|
required |
Source code in hera/datalayer/autocache.py
clearAllFunctionsCache(projectName=None)
¶
Remove the cache of all functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
projectName
|
|
None
|
clearFunctionCache(functionName, projectName=None)
¶
Removes all the cache documents of the function with the data from the disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
functionName
|
str
|
The name of the function |
required |
projectName
|
str
|
The name of the project that holds the cache. If None, load the name from the caseConfiguration. |
None
|
Source code in hera/datalayer/autocache.py
cacheFunction(_func=None, *, returnFormat=None, projectName=None, postProcessFunction=None, getDataParams={}, storeDataParams={})
¶
Decorator that caches a function's return value in the project database.
On first call, the function executes and its result is saved as a cache document. On subsequent calls with the same arguments, the cached result is returned instead of re-executing the function.
Can be used with or without arguments::
@cacheFunction
def my_func(x):
...
@cacheFunction(returnFormat=datatypes.PARQUET, projectName="myproject")
def my_func(x):
...
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returnFormat
|
str
|
The data format to use when storing the result. If None, auto-detected. |
None
|
projectName
|
str
|
The project to store the cache in. If None, loaded from caseConfiguration. |
None
|
postProcessFunction
|
callable
|
A function applied to the result before returning it. |
None
|
getDataParams
|
dict
|
Extra keyword arguments passed to |
{}
|
storeDataParams
|
dict
|
Extra keyword arguments passed when saving to cache. |
{}
|