Datasets API#
To create an object of this class, use get_dataset_api
DatasetApi #
add #
Deprecated.
Attach a name/value tag to a model.
A tag consists of a name/value pair. Tag names are unique identifiers. The value of a tag can be any valid json - primitives, arrays or json objects.
| PARAMETER | DESCRIPTION |
|---|---|
path | Path to add the tag. TYPE: |
name | Name of the tag to be added. TYPE: |
value | Value of the tag to be added. TYPE: |
chmod #
Change permissions of a file or a directory in the Hopsworks Filesystem.
| PARAMETER | DESCRIPTION |
|---|---|
remote_path | Path to change the permissions of. TYPE: |
permissions | Permissions string, for example TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
dict | The updated dataset metadata. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
copy #
Copy a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.copy("Resources/myfile.txt", "Logs/myfile.txt")
| PARAMETER | DESCRIPTION |
|---|---|
source_path | The source path to copy. TYPE: |
destination_path | The destination path. TYPE: |
overwrite | Overwrite destination if exists. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.DatasetException | If the destination path already exists and overwrite is not set to |
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
delete #
download #
download(
path: str,
local_path: str | None = None,
overwrite: bool | None = False,
chunk_size: int = DEFAULT_DOWNLOAD_FLOW_CHUNK_SIZE,
) -> str
Download file from Hopsworks Filesystem to the current working directory.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
downloaded_file_path = dataset_api.download("Resources/my_local_file.txt")
| PARAMETER | DESCRIPTION |
|---|---|
path | Path in Hopsworks filesystem to the file. TYPE: |
local_path | Path where to download the file in the local filesystem. TYPE: |
overwrite | Overwrite local file if exists. TYPE: |
chunk_size | Upload chunk size in bytes, defaults to 1 MB. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | The path to the downloaded file. |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
exists #
get #
get(path: str)
Deprecated.
Get dataset metadata.
| PARAMETER | DESCRIPTION |
|---|---|
path | Path to check. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
| Dataset metadata. |
get_tags #
list #
List the files and directories from a path in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
# list all files in the Resources dataset
files = dataset_api.list("/Resources")
# list all datasets in the project
files = dataset_api.list("/")
| PARAMETER | DESCRIPTION |
|---|---|
path | Path in Hopsworks filesystem to the directory. TYPE: |
offset | The number of entities to skip. TYPE: |
limit | Max number of the returned entities. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
list[str] | List of path to files and directories in the provided path. |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
mkdir #
Create a directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.mkdir("Resources/my_dir")
| PARAMETER | DESCRIPTION |
|---|---|
path | Path to directory. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | Path to the created directory. |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
move #
Move a file or directory in the Hopsworks Filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
directory_path = dataset_api.move("Resources/myfile.txt", "Logs/myfile.txt")
| PARAMETER | DESCRIPTION |
|---|---|
source_path | The source path to move. TYPE: |
destination_path | The destination path. TYPE: |
overwrite | Overwrite destination if exists. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.DatasetException | If the destination path already exists and overwrite is not set to |
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
path_exists #
read_content #
Read the content of a file.
| PARAMETER | DESCRIPTION |
|---|---|
path | The path to the file to read. TYPE: |
dataset_type | The type of dataset, can be TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
| An object with |
remove #
remove(path: str)
Remove a path in the Hopsworks Filesystem.
| PARAMETER | DESCRIPTION |
|---|---|
path | Path to remove. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
rm #
rm(remote_path: str)
Deprecated, use remove instead.
Remove a path in the Hopsworks Filesystem.
| PARAMETER | DESCRIPTION |
|---|---|
remote_path | Path to remove. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
unzip #
Unzip an archive in the dataset.
| PARAMETER | DESCRIPTION |
|---|---|
remote_path | path to file or directory to unzip. TYPE: |
block | if the operation should be blocking until complete. TYPE: |
timeout | timeout in seconds for the blocking, defaults to 120; if TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
| Whether the operation completed in the specified timeout; if non-blocking, always returns |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
upload #
upload(
local_path: str,
upload_path: str,
overwrite: bool = False,
chunk_size: int = DEFAULT_UPLOAD_FLOW_CHUNK_SIZE,
simultaneous_uploads: int = DEFAULT_UPLOAD_SIMULTANEOUS_UPLOADS,
simultaneous_chunks: int = DEFAULT_UPLOAD_SIMULTANEOUS_CHUNKS,
max_chunk_retries: int = DEFAULT_UPLOAD_MAX_CHUNK_RETRIES,
chunk_retry_interval: int = 1,
) -> str
Upload a file or directory to the Hopsworks filesystem.
import hopsworks
project = hopsworks.login()
dataset_api = project.get_dataset_api()
# upload a file to Resources dataset
uploaded_file_path = dataset_api.upload("my_local_file.txt", "Resources")
# upload a directory to Resources dataset
uploaded_file_path = dataset_api.upload("my_dir", "Resources")
| PARAMETER | DESCRIPTION |
|---|---|
local_path | Local path to file or directory to upload, can be relative or absolute. TYPE: |
upload_path | Path to directory where to upload the file in Hopsworks Filesystem. TYPE: |
overwrite | Overwrite file or directory if exists. TYPE: |
chunk_size | Upload chunk size in bytes, defaults to 10 MB. TYPE: |
simultaneous_chunks | Number of simultaneous chunks to upload for each file upload. TYPE: |
simultaneous_uploads | Number of simultaneous files to be uploaded for directories. TYPE: |
max_chunk_retries | Maximum retry for a chunk. TYPE: |
chunk_retry_interval | Chunk retry interval in seconds. TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
str | The path to the uploaded file or directory. |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |
upload_feature_group #
upload_feature_group(feature_group, path, dataframe)
Upload a dataframe to a path in Parquet format using a feature group metadata.
Note
This method is a legacy method kept for backwards-compatibility; do not use it in new code.
zip #
zip(
remote_path: str,
destination_path: str | None = None,
block: bool = False,
timeout: int | None = 120,
) -> bool
Zip a file or directory in the dataset.
| PARAMETER | DESCRIPTION |
|---|---|
remote_path | Path to file or directory to zip. TYPE: |
destination_path | Path to upload the zip. TYPE: |
block | Whether the operation should be blocking until complete. TYPE: |
timeout | Timeout in seconds for the blocking, defaults to 120; if TYPE: |
| RETURNS | DESCRIPTION |
|---|---|
bool | Whether the operation completed in the specified timeout; if non-blocking, always returns |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | If the backend encounters an error when handling the request. |