Runtime Environment#

Specifying requirements#

In addition to the set of packages provided by the base docker image, you can specify a list of additional packages to install with a requirements.txt file. This can be done in a dataset configuration or in a task configuration.

# file: naip/dataset.yaml
name: naip
image: pctasks-basic:latest
code:
  requirements: ${{ local.path("requirements.txt") }}

This should be a text file following Pip’s requirements file format.

# file: naip/requirements.txt
git+https://github.com/stactools-packages/naip@dd703d010115b400e45ae8b1ca18816966e38231

The path specified in code.requirements should be relative to the dataset.yaml.

Uploading Code#

You can make a Python module (a single .py file) or package (a possibly nested directory with a __init__.py file) available to the workers executing code by specifying the code.src option on your dataset. The path specified by code.src should be relative to the dataset.yaml using the local.path(relative_path) templater or an absolute path.

Suppose you have a dataset configuration file naip/dataset.yaml, with an accompanying dataset.py file. By setting the code.src option to dataset.py that module will be included in the workers runtime.

# file: naip/dataset.yaml
name: naip
image: pctasks-basic:latest
code:
  src: ${{ local.path(dataset.py) }}

For single-file modules, the module will be importable using the name of the module: import dataset in this case.

Packages are importable using the name of the top-level directory. Given a layout like

mypackage/
  __init__.py
  module_a.py
  module_b.py

Your code is importable with import mypackage, from mypackage import module_a, etc.

Behind the scenes, when you submit a workflow generated from this dataset.yaml the module is uploaded to Azure Blob Storage. Before executing your task, the worker downloads that module and places it in a location that’s importable by the Python interpreter. The uploaded module / package is prioritized over any existing modules with the same import name.