Storage#

PCTasks uses an abstraction over storage to make working with local files, Azure Blob Storage, and SAS Tokens easier. This user guide goes over some key concepts you should know when working with pctasks.core.storage functionality.

The Storage class#

The pctasks.core.storage.Storage class is the main abstraction over file systems. You can use that instance to read and write files, walk files, get file info, etc. See the reference documentation for a list of all the methods provided.

Currently there are storage implementations for the local file system and Azure Blob Storage.

Paths vs URIs vs URLs vs Authenticated URLs#

These terms, which generally mean similar things, have specific definitions and differences in PCTasks.

path#

A “path” is a file location description that is relative to the root of the Storage instance. For instance, if my storage root is for a storage account “myaccount”, container “mycontainer”, with a path prefix of “blobs/”, then the path “file1.txt” would be represented by the URI of “blob://myaccount/mycontainer/blobs/file1.txt”.

URI#

A URI is a resource ID that can use non-http schemes. This is generally use to reference Azure Blob Storage locations more easily than their full Azure URLs. For instance, blob://myaccount/mycontainer/blobs/file1.txt is a URI that would translate to the URL https://myaccount.blob.core.windows.net/mycontainer/blobs/file1.txt.

URL#

A URL is an http schemed resource ID.

Authenticated URL#

An authenticated URL is a URL that is appended with a token. This is currently only applicable to Azure Storage, in which a SAS token is appended to the URL. You can get the authenticated URL with the Storage.get_authenticated_url method, but only if that Azure Storage instance was created using a SAS token.

StorageFactory#

A StorageFactory can create Storage instances given a URI. It can be configured with SAS tokens so that any URI requesting a storage account and container for which the factory has a token for, that Storage instance will be created using that token.

You can also call get_storage_for_file on the factory returns a tuple where the first element is the Storage instance and the second is the path that represents the file for that storage.

You can also skip instantiating a StorageFactory instance and use utility methods to create storage from a default instance:

from pctasks.core.storage import get_storage

storage = get_storage("/home/user")

and

from pctasks.core.storage import get_storage_for_file

storage, path = get_storage_for_file("blob://myaccount/mycontainer/blobs/file1.txt")
assert storage.get_uri() == "blob://myaccount/mycontainer/blobs"
assert path == "file1.txt"