Skip to content

Datasets API#

To create an object of this class, use get_dataset_api

DatasetApi #

add #

add(path: str, name: str, value: str)

Deprecated.

Attach a name/value tag to a model.

A tag consists of a name/value pair. Tag names are unique identifiers. The value of a tag can be any valid json - primitives, arrays or json objects.

PARAMETER DESCRIPTION
path

Path to add the tag.

TYPE: str

name

Name of the tag to be added.

TYPE: str

value

Value of the tag to be added.

TYPE: str

chmod #

chmod(remote_path: str, permissions: str) -> dict

Change permissions of a file or a directory in the Hopsworks Filesystem.

PARAMETER DESCRIPTION
remote_path

Path to change the permissions of.

TYPE: str

permissions

Permissions string, for example "u+x".

TYPE: str

RETURNS DESCRIPTION
dict

The updated dataset metadata.

TYPE: dict

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

copy #

copy(
    source_path: str,
    destination_path: str,
    overwrite: bool = False,
)

Copy a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
PARAMETER DESCRIPTION
source_path

The source path to copy.

TYPE: str

destination_path

The destination path.

TYPE: str

overwrite

Overwrite destination if exists.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
hopsworks.client.exceptions.DatasetException

If the destination path already exists and overwrite is not set to True.

hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

delete #

delete(path: str, name: str)

Deprecated.

Delete a tag.

Tag names are unique identifiers.

PARAMETER DESCRIPTION
path

Path to delete the tags.

TYPE: str

name

Name of the tag to be removed.

TYPE: str

download #

download(
    path: str,
    local_path: str | None = None,
    overwrite: bool | None = False,
    chunk_size: int = DEFAULT_DOWNLOAD_FLOW_CHUNK_SIZE,
) -> str

Download file from Hopsworks Filesystem to the current working directory.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
PARAMETER DESCRIPTION
path

Path in Hopsworks filesystem to the file.

TYPE: str

local_path

Path where to download the file in the local filesystem.

TYPE: str | None DEFAULT: None

overwrite

Overwrite local file if exists.

TYPE: bool | None DEFAULT: False

chunk_size

Upload chunk size in bytes, defaults to 1 MB.

TYPE: int DEFAULT: DEFAULT_DOWNLOAD_FLOW_CHUNK_SIZE

RETURNS DESCRIPTION
str

The path to the downloaded file.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

exists #

exists(path: str) -> bool

Check if a file exists in the Hopsworks Filesystem.

PARAMETER DESCRIPTION
path

Path to check.

TYPE: str

RETURNS DESCRIPTION
bool

True if exists, otherwise False.

get #

get(path: str)

Deprecated.

Get dataset metadata.

PARAMETER DESCRIPTION
path

Path to check.

TYPE: str

RETURNS DESCRIPTION

Dataset metadata.

get_tags #

get_tags(path: str, name: str | None = None) -> dict

Deprecated.

Get the tags.

Gets all tags if no tag name is specified.

PARAMETER DESCRIPTION
path

Path to get the tags.

TYPE: str

name

Tag name.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
dict

Tag names and values.

list #

list(
    path: str, offset: int = 0, limit: int = 1000
) -> list[str]

List the files and directories from a path in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

# list all files in the Resources dataset
files = dataset_api.list("/Resources")

# list all datasets in the project
files = dataset_api.list("/")
PARAMETER DESCRIPTION
path

Path in Hopsworks filesystem to the directory.

TYPE: str

offset

The number of entities to skip.

TYPE: int DEFAULT: 0

limit

Max number of the returned entities.

TYPE: int DEFAULT: 1000

RETURNS DESCRIPTION
list[str]

List of path to files and directories in the provided path.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

mkdir #

mkdir(path: str) -> str

Create a directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.mkdir("Resources/my_dir")
PARAMETER DESCRIPTION
path

Path to directory.

TYPE: str

RETURNS DESCRIPTION
str

Path to the created directory.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

move #

move(
    source_path: str,
    destination_path: str,
    overwrite: bool = False,
)

Move a file or directory in the Hopsworks Filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
PARAMETER DESCRIPTION
source_path

The source path to move.

TYPE: str

destination_path

The destination path.

TYPE: str

overwrite

Overwrite destination if exists.

TYPE: bool DEFAULT: False

RAISES DESCRIPTION
hopsworks.client.exceptions.DatasetException

If the destination path already exists and overwrite is not set to True.

hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

path_exists #

path_exists(remote_path: str) -> bool

Deprecated, use exists instead.

Check if a path exists in datasets.

PARAMETER DESCRIPTION
remote_path

Path to check.

TYPE: str

RETURNS DESCRIPTION
bool

True if exists, otherwise False.

read_content #

read_content(path: str, dataset_type: str = 'DATASET')

Read the content of a file.

PARAMETER DESCRIPTION
path

The path to the file to read.

TYPE: str

dataset_type

The type of dataset, can be DATASET or HIVEDB; defaults to DATASET. HIVEDB type is used to read files from Apache Hive.

TYPE: str DEFAULT: 'DATASET'

RETURNS DESCRIPTION

An object with content attribute containing the file content as bytes, or None if the file was not found.

remove #

remove(path: str)

Remove a path in the Hopsworks Filesystem.

PARAMETER DESCRIPTION
path

Path to remove.

TYPE: str

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

rm #

rm(remote_path: str)

Deprecated, use remove instead.

Remove a path in the Hopsworks Filesystem.

PARAMETER DESCRIPTION
remote_path

Path to remove.

TYPE: str

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

unzip #

unzip(
    remote_path: str,
    block: bool = False,
    timeout: int | None = 120,
)

Unzip an archive in the dataset.

PARAMETER DESCRIPTION
remote_path

path to file or directory to unzip.

TYPE: str

block

if the operation should be blocking until complete.

TYPE: bool DEFAULT: False

timeout

timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.

TYPE: int | None DEFAULT: 120

RETURNS DESCRIPTION

Whether the operation completed in the specified timeout; if non-blocking, always returns True.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

upload #

upload(
    local_path: str,
    upload_path: str,
    overwrite: bool = False,
    chunk_size: int = DEFAULT_UPLOAD_FLOW_CHUNK_SIZE,
    simultaneous_uploads: int = DEFAULT_UPLOAD_SIMULTANEOUS_UPLOADS,
    simultaneous_chunks: int = DEFAULT_UPLOAD_SIMULTANEOUS_CHUNKS,
    max_chunk_retries: int = DEFAULT_UPLOAD_MAX_CHUNK_RETRIES,
    chunk_retry_interval: int = 1,
) -> str

Upload a file or directory to the Hopsworks filesystem.

import hopsworks

project = hopsworks.login()

dataset_api = project.get_dataset_api()

# upload a file to Resources dataset
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")

# upload a directory to Resources dataset
uploaded_file_path = dataset_api.upload("my_dir", "Resources")
PARAMETER DESCRIPTION
local_path

Local path to file or directory to upload, can be relative or absolute.

TYPE: str

upload_path

Path to directory where to upload the file in Hopsworks Filesystem.

TYPE: str

overwrite

Overwrite file or directory if exists.

TYPE: bool DEFAULT: False

chunk_size

Upload chunk size in bytes, defaults to 10 MB.

TYPE: int DEFAULT: DEFAULT_UPLOAD_FLOW_CHUNK_SIZE

simultaneous_chunks

Number of simultaneous chunks to upload for each file upload.

TYPE: int DEFAULT: DEFAULT_UPLOAD_SIMULTANEOUS_CHUNKS

simultaneous_uploads

Number of simultaneous files to be uploaded for directories.

TYPE: int DEFAULT: DEFAULT_UPLOAD_SIMULTANEOUS_UPLOADS

max_chunk_retries

Maximum retry for a chunk.

TYPE: int DEFAULT: DEFAULT_UPLOAD_MAX_CHUNK_RETRIES

chunk_retry_interval

Chunk retry interval in seconds.

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
str

The path to the uploaded file or directory.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.

upload_feature_group #

upload_feature_group(feature_group, path, dataframe)

Upload a dataframe to a path in Parquet format using a feature group metadata.

Note

This method is a legacy method kept for backwards-compatibility; do not use it in new code.

zip #

zip(
    remote_path: str,
    destination_path: str | None = None,
    block: bool = False,
    timeout: int | None = 120,
) -> bool

Zip a file or directory in the dataset.

PARAMETER DESCRIPTION
remote_path

Path to file or directory to zip.

TYPE: str

destination_path

Path to upload the zip.

TYPE: str | None DEFAULT: None

block

Whether the operation should be blocking until complete.

TYPE: bool DEFAULT: False

timeout

Timeout in seconds for the blocking, defaults to 120; if None, the blocking is unbounded.

TYPE: int | None DEFAULT: 120

RETURNS DESCRIPTION
bool

Whether the operation completed in the specified timeout; if non-blocking, always returns True.

RAISES DESCRIPTION
hopsworks.client.exceptions.RestAPIError

If the backend encounters an error when handling the request.