Deployment#
You can obtain a ModelServing using Project.get_model_serving. Then you can create a deployment using ModelServing.create_deployment, and retrieve existing deployments using ModelServing.get_deployment, ModelServing.get_deployment_by_id, and ModelServing.get_deployments.
You can also create a deployment by deploying a model via Model.deploy or from a predictor via Predictor.deploy.
Deployment #
NOT_FOUND_ERROR_CODE class-attribute instance-attribute #
NOT_FOUND_ERROR_CODE = 240000
Metadata object representing a deployment in Model Serving.
api_protocol property writable #
api_protocol
API protocol enabled in the deployment (e.g., HTTP or GRPC).
artifact_files_path property #
artifact_files_path
Path of the artifact files deployed by the predictor.
artifact_path property #
artifact_path
Path of the model artifact deployed by the predictor.
Deprecated
Artifact versions are deprecated in favor of deployment versions.
artifact_version property writable #
artifact_version
Artifact version deployed by the predictor.
Deprecated
Artifact versions are deprecated in favor of deployment versions.
config_file property writable #
config_file
Model server configuration file passed to the model deployment.
It can be accessed via CONFIG_FILE_PATH environment variable from a predictor or transformer script. For LLM deployments without a predictor script, this file is used to configure the vLLM engine.
inference_batcher property writable #
inference_batcher
Configuration of the inference batcher attached to this predictor.
inference_logger property writable #
inference_logger
Configuration of the inference logger attached to this predictor.
requested_instances property #
requested_instances
Total number of requested instances in the deployment.
delete #
delete(force=False)
Delete the deployment.
| PARAMETER | DESCRIPTION |
|---|---|
force | Force the deletion of the deployment. If the deployment is running, it will be stopped and deleted automatically. DEFAULT: |
Warning
A call to this method does not ask for a second confirmation.
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
download_artifact_files #
download_artifact_files(local_path=None)
Download the artifact files served by the deployment.
| PARAMETER | DESCRIPTION |
|---|---|
local_path | path where to download the artifact files in the local filesystem DEFAULT: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
get_logs #
get_logs(component='predictor', tail=10)
Prints the deployment logs of the predictor or transformer.
| PARAMETER | DESCRIPTION |
|---|---|
component | Deployment component to get the logs from (e.g., predictor or transformer) DEFAULT: |
tail | Number of most recent lines to retrieve from the logs. DEFAULT: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
get_state #
get_state() -> PredictorState
Get the current state of the deployment.
| RETURNS | DESCRIPTION |
|---|---|
PredictorState |
|
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
is_created #
is_created() -> bool
Check whether the deployment is created.
| RETURNS | DESCRIPTION |
|---|---|
bool |
|
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
is_running #
is_running(or_idle=True, or_updating=True) -> bool
Check whether the deployment is ready to handle inference requests.
| PARAMETER | DESCRIPTION |
|---|---|
or_idle | Whether the idle state is considered as running (default is True) DEFAULT: |
or_updating | Whether the updating state is considered as running (default is True) DEFAULT: |
| RETURNS | DESCRIPTION |
|---|---|
bool |
|
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
is_stopped #
is_stopped(or_created=True) -> bool
Check whether the deployment is stopped.
| PARAMETER | DESCRIPTION |
|---|---|
or_created | Whether the creating and created state is considered as stopped (default is True) DEFAULT: |
| RETURNS | DESCRIPTION |
|---|---|
bool |
|
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
predict #
Send inference requests to the deployment.
One of data or inputs parameters must be set. If both are set, inputs will be ignored.
Example
# login into Hopsworks using hopsworks.login()
# get Hopsworks Model Serving handle
ms = project.get_model_serving()
# retrieve deployment by name
my_deployment = ms.get_deployment("my_deployment")
# (optional) retrieve model input example
my_model = project.get_model_registry() .get_model(my_deployment.model_name, my_deployment.model_version)
# make predictions using model inputs (single or batch)
predictions = my_deployment.predict(inputs=my_model.input_example)
# or using more sophisticated inference request payloads
data = { "instances": [ my_model.input_example ], "key2": "value2" }
predictions = my_deployment.predict(data)
| PARAMETER | DESCRIPTION |
|---|---|
data | Payload dictionary for the inference request including the model input(s) TYPE: |
inputs | Model inputs used in the inference requests |
| RETURNS | DESCRIPTION |
|---|---|
|
|
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
save #
save(await_update: int | None = 600)
Persist this deployment including the predictor and metadata to Model Serving.
| PARAMETER | DESCRIPTION |
|---|---|
await_update | If the deployment is running, awaiting time (seconds) for the running instances to be updated. If the running instances are not updated within this timespan, the call to this method returns while the update in the background. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
start #
start(await_running: int | None = 600)
Start the deployment.
| PARAMETER | DESCRIPTION |
|---|---|
await_running | Awaiting time (seconds) for the deployment to start. If the deployment has not started within this timespan, the call to this method returns while it deploys in the background. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |
stop #
stop(await_stopped: int | None = 600)
Stop the deployment.
| PARAMETER | DESCRIPTION |
|---|---|
await_stopped | Awaiting time (seconds) for the deployment to stop. If the deployment has not stopped within this timespan, the call to this method returns while it stopping in the background. TYPE: |
| RAISES | DESCRIPTION |
|---|---|
hopsworks.client.exceptions.RestAPIError | In case the backend encounters an issue |