Projects Documentation¶

Welcome to the Projects documentation. In this tutorial, you will learn about projects and their basic operations. We will cover how to create a project, load documents into it, and work with them using toolkits.

What is a Proejct?¶

You can think of a project as a collection of data sources and other documents (such as measurement documents) organized in one place.
By adding a document to a project, users can connect the corresponding toolkit without needing to remember the paths to the data they want to work with.
This provides a flexible and convenient way to manage and work with various types of data in a single, unified environment

Creating a Project¶

Creating a Project using Command Line Interface (CLI)¶
The user can easily create a Project using the following command:

>> hera-project project create &ltPROJECT_NAME> --directory &ltDIRECTORY_PATH> --noRepositories &ltANY_CHAR>

By executing this command, a new project and a configuration JSON file will be created. Before discussing this file, let’s go over the command arguments:

- PROJECT_NAME: The name of the Project the user.
- directory (Optional): The path to save the configuration file. If not specified, will create in the folder where the user executes the command.
- noRepositories (Optional): If True, will not apply the repositories loading (Will discuss soon). Default is False.

Example:
If I want to create a project named 'MY_PROJECT', with the configuration file created in the current directory and without using repositories, I would type:

>> hera-project project create MY_FIRST_PROJECT --noRepositories 1

Initate the Project using Python¶
We can access the Project class in Python and perform various operations.
Note: In Hera, if we specify a non-existent project in a Python script (without performing Stage 1 as shown earlier using the CLI), no error will occur. In addition, the project will be automaticly created when a new document is loaded to it.

However:
1. No configuration file will be created.
2. Repository loading (which we will discuss shortly) will not be applied.

Let's see now how can we initate Projects using Python.

First, we need to import the Project object:

In [1]:

Copied!

from hera import Project 
from hera import Project

Initialize from a configuration file¶

If we have created the caseConfiguration file and it is located in the same directory where we are writing the script, we can initiate the project as follows:

In [2]:

Copied!

proj = Project()
print(proj.projectName)
proj = Project()
print(proj.projectName)

MY_FIRST_PROJECT

As we said above, this will only work by creating a json configuration file and run the code in its directory.

Initialize by Project Name¶

If the caseConfiguration file is not placed in your current directory, you can easily access a project from anywhere in our local machine by using only the project name, like this:

In [4]:

Copied!

proj = Project(projectName="MY_FIRST_PROJECT")
print(proj.projectName)
proj = Project(projectName="MY_FIRST_PROJECT")
print(proj.projectName)

MY_FIRST_PROJECT

Again, if no project exists with the specified name, Hera will not throw an error. The project will only be created automatically when the first document is loaded into it.

Loading Documents into to a Project (Manualy)¶

Let's see how we can load a datasource and a measurment documents into a project.

Load Data Sources to Projects¶

To load a data source into a project, the user must first connect the corresponding toolkit to the project and then use a function named addDataSource().

For demonstration purposes, let’s use an example of loading a LandCover data source into a project. We have already saw a similar example in the pervious documentation.
First, we initate a LandCover toolkit to the project we just created - 'MY_FIRST_PROJECT':

In [6]:

Copied!





from hera import toolkitHome

toolkitName = toolkitHome.GIS_LANDCOVER
projectName = "MY_FIRST_PROJECT"

landcover_toolkit = toolkitHome.getToolkit(
                    toolkitName=toolkitName,
                    projectName=projectName)
from hera import toolkitHome

toolkitName = toolkitHome.GIS_LANDCOVER
projectName = "MY_FIRST_PROJECT"

landcover_toolkit = toolkitHome.getToolkit(
                    toolkitName=toolkitName,
                    projectName=projectName)

Using the addDataSource() function, we can add a LandCover datasource to the project.
As we saw in the pervious documentation, the datasource structure is compromised of several fields. Here, we specify them manually in the code:

In [7]:

Copied!





dataSourceName = 'MY_LANDCOVER_DATASOURCE'
resource = './prefixaac.tif'
dataFormat = 'geotiff'
version = (1,0,0)
overwrite = True
kwargs = dict(year=2021,type=1)

landcover_toolkit.addDataSource(
    dataSourceName=dataSourceName,
    resource=resource,
    dataFormat=dataFormat,
    version=version,
    overwrite=overwrite,
    kwargs=kwargs
)
dataSourceName = 'MY_LANDCOVER_DATASOURCE'
resource = './prefixaac.tif'
dataFormat = 'geotiff'
version = (1,0,0)
overwrite = True
kwargs = dict(year=2021,type=1)

landcover_toolkit.addDataSource(
    dataSourceName=dataSourceName,
    resource=resource,
    dataFormat=dataFormat,
    version=version,
    overwrite=overwrite,
    kwargs=kwargs
)

Out[7]:

<Measurements: {
    "_cls": "Metadata.Measurements",
    "projectName": "MY_FIRST_PROJECT",
    "desc": {
        "kwargs": {
            "year": 2021,
            "type": 1
        },
        "toolkit": "LandCoverToolkit",
        "datasourceName": "MY_LANDCOVER_DATASOURCE",
        "version": [
            1,
            0,
            0
        ]
    },
    "type": "ToolkitDataSource",
    "resource": "./prefixaac.tif",
    "dataFormat": "geotiff"
}>

Reminder:

dataSourceName: The name we want to assign to the data source.
resource: The path to the actual data (in this case, the file is in this folder).
dataFormat: The format of the data. It can be string, parquet, and more (in this case, 'geotiff').
version: The version of the data source.
overwrite: Whether to overwrite an existing data source with the same name.
kwargs: Additional metadata related to the data source.

Now we have successfully added a new LandCover data source to the project. We can verify this by using the getDataSourceList() function:

In [8]:

Copied!

landcover_toolkit.getDataSourceList()
landcover_toolkit.getDataSourceList()

Out[8]:

['MY_LANDCOVER_DATASOURCE']

Disadvantages of Adding Documents Manually¶

While adding documents manually is quick and straightforward, it has a significant disadvantage.
Suppose you want to create another project and use the same documents. In that case, you will need to repeat the entire process of adding the documents again.
If you have multiple data sources and documents to use across various projects, you’ll need to repeatedly add them, remembering all the paths and associated information, which can become tedious and error-prone.
To overcome this problem, the Hera system provides a feature called Repository, which we will cover next.

Repositories¶

Repositories are essentially lists of data sources and documents. They are stored in a JSON file, which keeps track of all the data sources and documents.
If a Repository is added to the repositories list in the system, a new created project will automatically include all the data sources and documents from this repository (and all repositories in the list). This eliminates the need for users to manually add data sources and documents each time or remember their paths.

Structure¶

For demonstration, let's use an example of a repository consisting of two data sources:

{
"GIS_LandCover": {
    "Config": {
      "defaultLandCover": "Type-1"
    },
    "DataSource": {
      "Type-1": {
        "isRelativePath": "True",
        "item": {
          "resource": "prefixaac.tif",
          "dataFormat": "geotiff",
          "desc": {
            "year": 2021,
            "type": 1
          }
        }
      }
    }
  },
"GIS_Tiles": {
    "Config": {
      "defaultTileServer": "http://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}"
    },
    "DataSource": {
      "THE_LOCAL_TILE_SERVER": {
        "isRelativePath": "False",
        "item": {
          "resource": "http://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}",
          "dataFormat": "string"
        }
      }
    }
  }
}

As you can see, in this case, the repository contains a list of data sources. One notable difference from the general data source structure is the isRelativePath field and the Config field:

isRelativePath: Indicates whether the data source path is relative to the location of the Repository JSON file. If set to false, the system assumes the path is absolute.
Config: A JSON element that specifies which data source is the default. This is useful when the user has multiple data sources in a domain, as it allows the user to work with the default data source without specifying its name each time (this will be discussed further).

Adding a Repository to the Repositories list¶

When creating a project , all documents and data sources from each repository in the Repositories list will be added to the project.
If the user anticipates using several documents across multiple projects in the future, he first need to define the documents in a repository and then add the repository to the repositories list. Now, whenever a project is created, all documents will be automatically included.
Important Note: This functionality is only applicable when the project is created using the CLI. If the user creates a project by adding a document to a non-existent project (as demonstrated above), the documents in the repositories will not be added.

Repository Basic Functions¶

For adding it, we use the next command in the CLI:

>> hera-project repository add &ltPATH_TO_REPOSITORY> --overwrite &ltANY_CHAR>

For displaying the repository list, we use the next command in the CLI:

>> hera-project repository list

For removing a repository from the repository list, we use the next command in the CLI:

>> hera-project repository remove &ltPATH_TO_REPOSITORY>

For dipslaying items in a specific repository, we use:

>> hera-project repository show &ltREPOSITORY_NAME>

CLI for Projects¶

Here are some basic project commands to use in the CLI:

Listing the Projects in the system:

>> hera-project project list

Updating Repository Documents in to a Project: Useful if a project was created before adding a new repository to the repositories list, and the user wishes to update the project with the documents from the repository:

>> hera-project project updateRepositories --projectName &ltPROJECT_NAME> --overwrite &ltANY_CHAR>

Arguments:

--projectName (Optional): The project to update with the current repositories list. If not specified, will update all projects in the system.
--overwrite (Optional): If overwrite the project with the current repositories list. Default is False.

Command Line Interface (CLI) in Hera¶

What is CLI? CLI stands for Command Line Interface — it means interacting with the system by typing text commands instead of using graphical menus and buttons.

In Hera, the CLI allows you to quickly and efficiently perform tasks like:

Creating a project
Adding or removing repositories
Listing available projects
Updating project repositories

Why Use CLI?

Faster execution compared to using scripts.
Easy automation for repeated tasks.
Remote management of Hera projects.

Example: Creating a new project using CLI

hera-project project create MY_FIRST_PROJECT --noRepositories 1

This command will:

Create a new project named MY_FIRST_PROJECT.
Skip loading any default repositories at creation time.

Summary: Using the CLI helps you interact directly with Hera's management system in a simple, fast, and powerful way.