Feature View#

You can create a feature view using FeatureStore.create_feature_view, and retrieve them using FeatureStore.get_feature_view and FeatureStore.get_feature_views.

FeatureView #

Metadata class for Hopsworks feature views.

Feature view is a logical grouping of features, defined by a query over feature groups.

description `property` `writable` #

description: str | None

Description of the feature view.

feature_logging `property` #

feature_logging: FeatureLogging | None

Feature logging feature groups of this feature view.

feature_store_name `property` #

feature_store_name: str | None

Name of the feature store in which the feature group is located.

features `property` #

features: list[
    training_dataset_feature.TrainingDatasetFeature
]

Schema of untransformed features in the Feature view. (alias).

featurestore_id `property` `writable` #

featurestore_id: int

Feature store id.

id `property` `writable` #

id: int

Feature view id.

inference_helper_columns `property` `writable` #

inference_helper_columns: list[str]

The helper column sof the feature view.

Can be a composite of multiple features.

labels `property` `writable` #

labels: list[str]

The labels/prediction feature of the feature view.

Can be a composite of multiple features.

logging_enabled `property` `writable` #

logging_enabled: bool

Whether feature logging is enabled for the feature view.

model_dependent_transformations `property` #

model_dependent_transformations: dict[str, Callable]

Get Model-Dependent transformations as a dictionary mapping transformed feature names to transformation function.

name `property` `writable` #

name: str

Name of the feature view.

on_demand_transformations `property` #

on_demand_transformations: dict[str, Callable]

Get On-Demand transformations as a dictionary mapping on-demand feature names to transformation function.

primary_keys `property` #

primary_keys: set[str]

Set of primary key names that is required as keys in input dict object for get_feature_vector(s) method.

When there are duplicated primary key names and prefix is not defined in the query, prefix is generated and prepended to the primary key name in this format "fgId_{feature_group_id}_{join_index}" where join_index is the order of the join.

query `property` `writable` #

query: query.Query

Query of the feature view.

request_parameters `property` #

request_parameters: list[str]

Get request parameters required for the for on-demand transformations atatched to the feature view.

schema `property` `writable` #

schema: list[
    training_dataset_feature.TrainingDatasetFeature
]

Schema of untransformed features in the Feature view.

serving_keys `property` `writable` #

serving_keys: list[skm.ServingKey]

All primary keys of the feature groups included in the query.

training_helper_columns `property` `writable` #

training_helper_columns: list[str]

The helper column sof the feature view.

Can be a composite of multiple features.

transformation_functions `property` `writable` #

transformation_functions: list[TransformationFunction]

Get transformation functions.

version `property` `writable` #

version: int

Version number of the feature view.

add_tag #

add_tag(name: str, value: Any) -> None

Attach a tag to a feature view.

A tag consists of a name and value pair. Tag names are unique identifiers across the whole cluster. The value of a tag can be any valid json - primitives, arrays or json objects.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# attach a tag to a feature view
feature_view.add_tag(name="tag_schema", value={"key", "value"})

PARAMETER	DESCRIPTION
`name`	Name of the tag to be added. TYPE: `str`
`value`	Value of the tag to be added. TYPE: `Any`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

add_training_dataset_tag #

add_training_dataset_tag(
    training_dataset_version: int,
    name: str,
    value: dict[str, Any] | tag.Tag,
) -> None

Attach a tag to a training dataset.

Example

# get feature store instance
fs = ...

# get feature feature view instance
feature_view = fs.get_feature_view(...)

# attach a tag to a training dataset
feature_view.add_training_dataset_tag(
    training_dataset_version=1,
    name="tag_schema",
    value={"key", "value"}
)

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`name`	Name of the tag to be added. TYPE: `str`
`value`	Value of the tag to be added. TYPE: `dict[str, Any] \| tag.Tag`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

clean `staticmethod` #

clean(
    feature_store_id: int,
    feature_view_name: str,
    feature_view_version: str,
) -> None

Delete the feature view and all associated metadata and training data.

This can delete corrupted feature view which cannot be retrieved due to a corrupted query for example.

Example

# delete a feature view and all associated metadata
from hsfs.feature_view import FeatureView

FeatureView.clean(
    feature_store_id=1,
    feature_view_name='feature_view_name',
    feature_view_version=1
)

Potentially dangerous operation

This operation drops all metadata associated with this version of the feature view and related training dataset and materialized data in HopsFS.

PARAMETER	DESCRIPTION
`feature_store_id`	ID of feature store. TYPE: `int`
`feature_view_name`	Name of feature view. TYPE: `str`
`feature_view_version`	Version of feature view. TYPE: `str`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

compute_on_demand_features #

compute_on_demand_features(
    feature_vector: list[Any]
    | list[list[Any]]
    | pd.DataFrame
    | pl.DataFrame
    | None = None,
    request_parameters: list[dict[str, Any]]
    | dict[str, Any]
    | None = None,
    transformation_context: dict[str, Any] = None,
    return_type: Literal[
        "list", "numpy", "pandas", "polars"
    ] = None,
)

Function computes on-demand features present in the feature view.

PARAMETER	DESCRIPTION
`feature_vector`	`Union[List[Any], List[List[Any]], pd.DataFrame, pl.DataFrame]`. The feature vector to be transformed. TYPE: `list[Any] \| list[list[Any]] \| pd.DataFrame \| pl.DataFrame \| None` DEFAULT: `None`
`request_parameters`	Request parameters required by on-demand transformation functions to compute on-demand features present in the feature view. TYPE: `list[dict[str, Any]] \| dict[str, Any] \| None` DEFAULT: `None`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`
`return_type`	`"list"`, `"pandas"`, `"polars"` or `"numpy"`. Defaults to the same type as the input feature vector. TYPE: `Literal['list', 'numpy', 'pandas', 'polars']` DEFAULT: `None`

RETURNS	DESCRIPTION
	`Union[List[Any], List[List[Any]], pd.DataFrame, pl.DataFrame]`: The feature vector that contains all on-demand features in the feature view.

create_alert #

create_alert(receiver: str, status: str, severity: str)

Create an alert for this feature view.

Example

# get feature store instance
fs = ...
# get feature view instance
feature_view = fs.get_feature_view(...)
# create an alert
alert = feature_view.create_alert(
    receiver="email",
    status="feature_monitor_shift_undetected",
    severity="info",
)

PARAMETER	DESCRIPTION
`receiver`	str. The receiver of the alert. TYPE: `str`
`status`	str. The status that will trigger the alert. Can be "feature_monitor_shift_undetected" or "feature_monitor_shift_detected". TYPE: `str`
`severity`	str. The severity of the alert. Can be "info", "warning" or "critical". TYPE: `str`

RETURNS	DESCRIPTION
	The created FeatureViewAlert object.

RAISES	DESCRIPTION
`ValueError`	If the status is not valid.
`ValueError`	If the severity is not valid.
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

create_feature_logger #

create_feature_logger()

Create an asynchronous feature logger for logging features in Hopsworks serving deployments.

Example

# get feature logger
feature_logger = feature_view.create_feature_logger()

# initialize feature view for serving with feature logger
feature_view.init_serving(1, feature_logger=feature_logger)

# log features
feature_view.log(...)

RAISES	DESCRIPTION
`hopsworks.client.exceptions.FeatureStoreException`	If not running in a Hopsworks serving deployment.

create_feature_monitoring #

create_feature_monitoring(
    name: str,
    feature_name: str,
    description: str | None = None,
    start_date_time: int
    | str
    | datetime
    | date
    | pd.Timestamp
    | None = None,
    end_date_time: int
    | str
    | datetime
    | date
    | pd.Timestamp
    | None = None,
    cron_expression: str | None = "0 0 12 ? * * *",
) -> fmc.FeatureMonitoringConfig

Enable feature monitoring to compare statistics on snapshots of feature data over time.

Experimental

Public API is subject to change, this feature is not suitable for production use-cases.

Example

# fetch feature view
fg = fs.get_feature_view(name="my_feature_view", version=1)
# enable feature monitoring
my_config = fg.create_feature_monitoring(
    name="my_monitoring_config",
    feature_name="my_feature",
    description="my monitoring config description",
    cron_expression="0 0 12 ? * * *",
).with_detection_window(
    # Data inserted in the last day
    time_offset="1d",
    window_length="1d",
).with_reference_window(
    # compare to a given value
    specific_value=0.5,
).compare_on(
    metric="mean",
    threshold=0.5,
).save()

PARAMETER	DESCRIPTION
`name`	Name of the feature monitoring configuration. name must be unique for all configurations attached to the feature group. TYPE: `str`
`feature_name`	Name of the feature to monitor. TYPE: `str`
`description`	Description of the feature monitoring configuration. TYPE: `str \| None` DEFAULT: `None`
`start_date_time`	Start date and time from which to start computing statistics. TYPE: `int \| str \| datetime \| date \| pd.Timestamp \| None` DEFAULT: `None`
`end_date_time`	End date and time at which to stop computing statistics. TYPE: `int \| str \| datetime \| date \| pd.Timestamp \| None` DEFAULT: `None`
`cron_expression`	Cron expression to use to schedule the job. The cron expression must be in UTC and follow the Quartz specification. Default is '0 0 12 ? * * ', every day at 12pm UTC. TYPE:* `str \| None` DEFAULT: `'0 0 12 ? * * *'`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.FeatureStoreException`	If the feature view is not registered in Hopsworks

RETURNS	DESCRIPTION
`fmc.FeatureMonitoringConfig`	`FeatureMonitoringConfig` Configuration with minimal information about the feature monitoring. Additional information are required before feature monitoring is enabled.

create_statistics_monitoring #

create_statistics_monitoring(
    name: str,
    feature_name: str | None = None,
    description: str | None = None,
    start_date_time: int
    | str
    | datetime
    | date
    | pd.Timestamp
    | None = None,
    end_date_time: int
    | str
    | datetime
    | date
    | pd.Timestamp
    | None = None,
    cron_expression: str | None = "0 0 12 ? * * *",
) -> fmc.FeatureMonitoringConfig

Run a job to compute statistics on snapshot of feature data on a schedule.

Experimental

Public API is subject to change, this feature is not suitable for production use-cases.

Example

# fetch feature view
fv = fs.get_feature_view(name="my_feature_view", version=1)
# enable statistics monitoring
my_config = fv._create_statistics_monitoring(
    name="my_config",
    start_date_time="2021-01-01 00:00:00",
    description="my description",
    cron_expression="0 0 12 ? * * *",
).with_detection_window(
    # Statistics computed on 10% of the last week of data
    time_offset="1w",
    row_percentage=0.1,
).save()

PARAMETER	DESCRIPTION
`name`	Name of the feature monitoring configuration. name must be unique for all configurations attached to the feature view. TYPE: `str`
`feature_name`	Name of the feature to monitor. If not specified, statistics will be computed for all features. TYPE: `str \| None` DEFAULT: `None`
`description`	Description of the feature monitoring configuration. TYPE: `str \| None` DEFAULT: `None`
`start_date_time`	Start date and time from which to start computing statistics. TYPE: `int \| str \| datetime \| date \| pd.Timestamp \| None` DEFAULT: `None`
`end_date_time`	End date and time at which to stop computing statistics. TYPE: `int \| str \| datetime \| date \| pd.Timestamp \| None` DEFAULT: `None`
`cron_expression`	Cron expression to use to schedule the job. The cron expression must be in UTC and follow the Quartz specification. Default is '0 0 12 ? * * ', every day at 12pm UTC. TYPE:* `str \| None` DEFAULT: `'0 0 12 ? * * *'`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.FeatureStoreException`	If the feature view is not registered in Hopsworks

RETURNS	DESCRIPTION
`fmc.FeatureMonitoringConfig`	`FeatureMonitoringConfig` Configuration with minimal information about the feature monitoring. Additional information are required before feature monitoring is enabled.

create_train_test_split #

create_train_test_split(
    test_size: float | None = None,
    train_start: str | int | datetime | date | None = "",
    train_end: str | int | datetime | date | None = "",
    test_start: str | int | datetime | date | None = "",
    test_end: str | int | datetime | date | None = "",
    storage_connector: storage_connector.StorageConnector
    | None = None,
    location: str | None = "",
    description: str | None = "",
    extra_filter: filter.Filter
    | filter.Logic
    | None = None,
    data_format: str | None = "parquet",
    coalesce: bool | None = False,
    seed: int | None = None,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    write_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[int, job.Job]

Create the metadata for a training dataset and save the corresponding training data into location.

The training data is split into train and test set at random or according to time ranges. The training data can be retrieved by calling feature_view.get_train_test_split.

Create random splits

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# create a train-test split dataset
version, job = feature_view.create_train_test_split(
    test_size=0.2,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

Create time series splits by specifying date as string

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
train_start = "2022-01-01 00:00:00"
train_end = "2022-06-06 23:59:59"
test_start = "2022-06-07 00:00:00"
test_end = "2022-12-25 23:59:59"

# create a train-test split dataset
version, job = feature_view.create_train_test_split(
    train_start=train_start,
    train_end=train_end,
    test_start=test_start,
    test_end=test_end,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

Create time series splits by specifying date as datetime object

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"

train_start = datetime.strptime("2022-01-01 00:00:00", date_format)
train_end = datetime.strptime("2022-06-06 23:59:59", date_format)
test_start = datetime.strptime("2022-06-07 00:00:00", date_format)
test_end = datetime.strptime("2022-12-25 23:59:59" , date_format)

# create a train-test split dataset
version, job = feature_view.create_train_test_split(
    train_start=train_start,
    train_end=train_end,
    test_start=test_start,
    test_end=test_end,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

Write training dataset to external storage

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get storage connector instance
external_storage_connector = fs.get_storage_connector("storage_connector_name")

# create a train-test split dataset
version, job = feature_view.create_train_test_split(
    train_start=...,
    train_end=...,
    test_start=...,
    test_end=...,
    storage_connector = external_storage_connector,
    description=...,
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format=...
)

Data Formats

The feature store currently supports the following data formats for training datasets:

tfrecord
csv
tsv
parquet
avro
orc
json

Currently not supported petastorm, hdf5 and npy file formats.

Warning, the following code will fail because category column contains sparse values and training dataset may not have all values available in test split.

import pandas as pd

df = pd.DataFrame({
    'category_col':['category_a','category_b','category_c','category_d'],
    'numeric_col': [40,10,60,40]
})

feature_group = fs.get_or_create_feature_group(
    name='feature_group_name',
    version=1,
    primary_key=['category_col']
)

feature_group.insert(df)

label_encoder = fs.get_transformation_function(name='label_encoder')

feature_view = fs.create_feature_view(
    name='feature_view_name',
    query=feature_group.select_all(),
    transformation_functions={'category_col':label_encoder}
)

feature_view.create_train_test_split(
    test_size=0.5
)
# Output: KeyError: 'category_c'

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`test_size`	size of test set. TYPE: `float \| None` DEFAULT: `None`
`train_start`	Start event time for the train split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`train_end`	End event time for the train split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_start`	Start event time for the test split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_end`	End event time for the test split query, exclusive. Strings should be formatted in one of the following ormats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`storage_connector`	Storage connector defining the sink location for the training dataset, defaults to `None`, and materializes training dataset on HopsFS. TYPE: `storage_connector.StorageConnector \| None` DEFAULT: `None`
`location`	Path to complement the sink storage connector with, e.g if the storage connector points to an S3 bucket, this path can be used to define a sub-directory inside the bucket to place the training dataset. Defaults to `""`, saving the training dataset at the root defined by the storage connector. TYPE: `str \| None` DEFAULT: `''`
`description`	A string describing the contents of the training dataset to improve discoverability for Data Scientists, defaults to empty string `""`. TYPE: `str \| None` DEFAULT: `''`
`extra_filter`	Additional filters to be attached to the training dataset. The filters will be also applied in `get_batch_data`. TYPE: `filter.Filter \| filter.Logic \| None` DEFAULT: `None`
`data_format`	The data format used to save the training dataset, defaults to `"parquet"`-format. TYPE: `str \| None` DEFAULT: `'parquet'`
`coalesce`	If true the training dataset data will be coalesced into a single partition before writing. The resulting training dataset will be a single file per split. Default False. TYPE: `bool \| None` DEFAULT: `False`
`seed`	Optionally, define a seed to create the random splits with, in order to guarantee reproducability, defaults to `None`. TYPE: `int \| None` DEFAULT: `None`
`statistics_config`	A configuration object, or a dictionary with keys "`enabled`" to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`write_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. When using the `python` engine, write_options can contain the following entries: * key `use_spark` and value `True` to materialize training dataset with Spark instead of Hopsworks Feature Query Service. * key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. * key `wait_for_job` and value `True` or `False` to configure whether or not to the save call should return only after the Hopsworks Job has finished. By default it waits. Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
td_version, `Job`	Tuple of training dataset version and job. When using the `python` engine, it returns the Hopsworks Job that was launched to create the training dataset.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

create_train_validation_test_split #

create_train_validation_test_split(
    validation_size: float | None = None,
    test_size: float | None = None,
    train_start: str | int | datetime | date | None = "",
    train_end: str | int | datetime | date | None = "",
    validation_start: str
    | int
    | datetime
    | date
    | None = "",
    validation_end: str | int | datetime | date | None = "",
    test_start: str | int | datetime | date | None = "",
    test_end: str | int | datetime | date | None = "",
    storage_connector: storage_connector.StorageConnector
    | None = None,
    location: str | None = "",
    description: str | None = "",
    extra_filter: filter.Filter
    | filter.Logic
    | None = None,
    data_format: str | None = "parquet",
    coalesce: bool | None = False,
    seed: int | None = None,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    write_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[int, job.Job]

Create the metadata for a training dataset and save the corresponding training data into location.

The training data is split into train, validation, and test set at random or according to time range. The training data can be retrieved by calling feature_view.get_train_validation_test_split.

Create random splits

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# create a train-validation-test split dataset
version, job = feature_view.create_train_validation_test_split(
    validation_size=0.3,
    test_size=0.2,
    description='Description of a dataset',
    data_format='csv'
)

Create time series splits by specifying date as string

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
train_start = "2022-01-01 00:00:00"
train_end = "2022-06-01 23:59:59"
validation_start = "2022-06-02 00:00:00"
validation_end = "2022-07-01 23:59:59"
test_start = "2022-07-02 00:00:00"
test_end = "2022-08-01 23:59:59"

# create a train-validation-test split dataset
version, job = feature_view.create_train_validation_test_split(
    train_start=train_start,
    train_end=train_end,
    validation_start=validation_start,
    validation_end=validation_end,
    test_start=test_start,
    test_end=test_end,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

Create time series splits by specifying date as datetime object

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"

train_start = datetime.strptime("2022-01-01 00:00:00", date_format)
train_end = datetime.strptime("2022-06-06 23:59:59", date_format)
validation_start = datetime.strptime("2022-06-02 00:00:00", date_format)
validation_end = datetime.strptime("2022-07-01 23:59:59", date_format)
test_start = datetime.strptime("2022-06-07 00:00:00", date_format)
test_end = datetime.strptime("2022-12-25 23:59:59", date_format)

# create a train-validation-test split dataset
version, job = feature_view.create_train_validation_test_split(
    train_start=train_start,
    train_end=train_end,
    validation_start=validation_start,
    validation_end=validation_end,
    test_start=test_start,
    test_end=test_end,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

Write training dataset to external storage

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get storage connector instance
external_storage_connector = fs.get_storage_connector("storage_connector_name")

# create a train-validation-test split dataset
version, job = feature_view.create_train_validation_test_split(
    train_start=...,
    train_end=...,
    validation_start=...,
    validation_end=...,
    test_start=...,
    test_end=...,
    description=...,
    storage_connector = external_storage_connector,
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format=...
)

Data Formats

The feature store currently supports the following data formats for training datasets:

tfrecord
csv
tsv
parquet
avro
orc
json

Currently not supported petastorm, hdf5 and npy file formats.

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`validation_size`	size of validation set. TYPE: `float \| None` DEFAULT: `None`
`test_size`	size of test set. TYPE: `float \| None` DEFAULT: `None`
`train_start`	Start event time for the train split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`train_end`	End event time for the train split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`validation_start`	Start event time for the validation split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`validation_end`	End event time for the validation split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_start`	Start event time for the test split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_end`	End event time for the test split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`storage_connector`	Storage connector defining the sink location for the training dataset, defaults to `None`, and materializes training dataset on HopsFS. TYPE: `storage_connector.StorageConnector \| None` DEFAULT: `None`
`location`	Path to complement the sink storage connector with, e.g if the storage connector points to an S3 bucket, this path can be used to define a sub-directory inside the bucket to place the training dataset. Defaults to `""`, saving the training dataset at the root defined by the storage connector. TYPE: `str \| None` DEFAULT: `''`
`description`	A string describing the contents of the training dataset to improve discoverability for Data Scientists, defaults to empty string `""`. TYPE: `str \| None` DEFAULT: `''`
`extra_filter`	Additional filters to be attached to the training dataset. The filters will be also applied in `get_batch_data`. TYPE: `filter.Filter \| filter.Logic \| None` DEFAULT: `None`
`data_format`	The data format used to save the training dataset, defaults to `"parquet"`-format. TYPE: `str \| None` DEFAULT: `'parquet'`
`coalesce`	If true the training dataset data will be coalesced into a single partition before writing. The resulting training dataset will be a single file per split. Default False. TYPE: `bool \| None` DEFAULT: `False`
`seed`	Optionally, define a seed to create the random splits with, in order to guarantee reproducability, defaults to `None`. TYPE: `int \| None` DEFAULT: `None`
`statistics_config`	A configuration object, or a dictionary with keys "`enabled`" to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`write_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. When using the `python` engine, write_options can contain the following entries: * key `use_spark` and value `True` to materialize training dataset with Spark instead of Hopsworks Feature Query Service. * key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. * key `wait_for_job` and value `True` or `False` to configure whether or not to the save call should return only after the Hopsworks Job has finished. By default it waits. Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
td_version, `Job`	Tuple of training dataset version and job. When using the `python` engine, it returns the Hopsworks Job that was launched to create the training dataset.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

create_training_data #

create_training_data(
    start_time: str | int | datetime | date | None = "",
    end_time: str | int | datetime | date | None = "",
    storage_connector: storage_connector.StorageConnector
    | None = None,
    location: str | None = "",
    description: str | None = "",
    extra_filter: filter.Filter
    | filter.Logic
    | None = None,
    data_format: str | None = "parquet",
    coalesce: bool | None = False,
    seed: int | None = None,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    write_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[int, job.Job]

Create the metadata for a training dataset and save the corresponding training data into location.

The training data can be retrieved by calling feature_view.get_training_data.

Create training dataset

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# create a training dataset
version, job = feature_view.create_training_data(
    description='Description of a dataset',
    data_format='csv',
    # async creation in order not to wait till finish of the job
    write_options={"wait_for_job": False}
)

Create training data specifying date range with dates as strings

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
start_time = "2022-01-01 00:00:00"
end_time = "2022-06-06 23:59:59"

# create a training dataset
version, job = feature_view.create_training_data(
    start_time=start_time,
    end_time=end_time,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

# When we want to read the training data, we need to supply the training data version returned by the create_training_data method:
X_train, X_test, y_train, y_test = feature_view.get_training_data(version)

Create training data specifying date range with dates as datetime objects

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"

start_time = datetime.strptime("2022-01-01 00:00:00", date_format)
end_time = datetime.strptime("2022-06-06 23:59:59", date_format)

# create a training dataset
version, job = feature_view.create_training_data(
    start_time=start_time,
    end_time=end_time,
    description='Description of a dataset',
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format='csv'
)

Write training dataset to external storage

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get storage connector instance
external_storage_connector = fs.get_storage_connector("storage_connector_name")

# create a train-test split dataset
version, job = feature_view.create_training_data(
    start_time=...,
    end_time=...,
    storage_connector = external_storage_connector,
    description=...,
    # you can have different data formats such as csv, tsv, tfrecord, parquet and others
    data_format=...
)

Data Formats

The feature store currently supports the following data formats for training datasets:

tfrecord
csv
tsv
parquet
avro
orc
json

Currently not supported petastorm, hdf5 and npy file formats.

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`start_time`	Start event time for the training dataset query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e., Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`end_time`	End event time for the training dataset query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e., Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`storage_connector`	Storage connector defining the sink location for the training dataset, defaults to `None`, and materializes training dataset on HopsFS. TYPE: `storage_connector.StorageConnector \| None` DEFAULT: `None`
`location`	Path to complement the sink storage connector with, e.g., if the storage connector points to an S3 bucket, this path can be used to define a sub-directory inside the bucket to place the training dataset. Defaults to `""`, saving the training dataset at the root defined by the storage connector. TYPE: `str \| None` DEFAULT: `''`
`description`	A string describing the contents of the training dataset to improve discoverability for Data Scientists. TYPE: `str \| None` DEFAULT: `''`
`extra_filter`	Additional filters to be attached to the training dataset. The filters will be also applied in `get_batch_data`. TYPE: `filter.Filter \| filter.Logic \| None` DEFAULT: `None`
`data_format`	The data format used to save the training dataset. TYPE: `str \| None` DEFAULT: `'parquet'`
`coalesce`	If true the training dataset data will be coalesced into a single partition before writing. The resulting training dataset will be a single file per split. TYPE: `bool \| None` DEFAULT: `False`
`seed`	Optionally, define a seed to create the random splits with, in order to guarantee reproducability. TYPE: `int \| None` DEFAULT: `None`
`statistics_config`	A configuration object, or a dictionary with keys: `"enabled"` to generally enable descriptive statistics computation for this feature group, `"correlations"` to turn on feature correlation computation, and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`write_options`	Additional options as key/value pairs to pass to the execution engine. For `spark` engine: Dictionary of read options for Spark. When using the `python` engine, write_options can contain the following entries: key `use_spark` and value `True` to materialize training dataset with Spark instead of Hopsworks Feature Query Service. key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. key `wait_for_job` and value `True` or `False` to configure whether or not to the save call should return only after the Hopsworks Job has finished. By default it waits. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`transformation_context`	A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`td_version`	training dataset version TYPE: `int`
`Job`	When using the `python` engine, the Hopsworks Job that was launched to create the training dataset. TYPE: `job.Job`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

delete #

delete() -> None

Delete current feature view, all associated metadata and training data.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# delete a feature view
feature_view.delete()

Potentially dangerous operation

This operation drops all metadata associated with this version of the feature view and related training dataset and materialized data in HopsFS.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

delete_all_training_datasets #

delete_all_training_datasets() -> None

Delete all training datasets. This will delete both metadata and training data.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# delete all training datasets
feature_view.delete_all_training_datasets()

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

delete_log #

delete_log(transformed: bool | None = None) -> None

Delete the logged feature data for the current feature view.

PARAMETER	DESCRIPTION
`transformed`	Whether to delete transformed logs. Defaults to None. Delete both transformed and untransformed logs. TYPE: `bool \| None` DEFAULT: `None`

Example

# delete log
feature_view.delete_log()

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to delete the log.

delete_tag #

delete_tag(name: str) -> None

Delete a tag attached to a feature view.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# delete a tag
feature_view.delete_tag('name_of_tag')

PARAMETER	DESCRIPTION
`name`	Name of the tag to be removed. TYPE: `str`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

delete_training_dataset #

delete_training_dataset(
    training_dataset_version: int,
) -> None

Delete a training dataset. This will delete both metadata and training data.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# delete a training dataset
feature_view.delete_training_dataset(
    training_dataset_version=1
)

PARAMETER	DESCRIPTION
`training_dataset_version`	Version of the training dataset to be removed. TYPE: `int`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

delete_training_dataset_tag #

delete_training_dataset_tag(
    training_dataset_version: int, name: str
) -> None

Delete a tag attached to a training dataset.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# delete training dataset tag
feature_view.delete_training_dataset_tag(
    training_dataset_version=1,
    name='name_of_dataset'
)

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`name`	Name of the tag to be removed. TYPE: `str`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

enable_logging #

enable_logging(
    extra_log_columns: Feature | dict[str, str] = None,
) -> None

Enable feature logging for the current feature view.

This method activates logging of features.

PARAMETER	DESCRIPTION
`extra_log_columns`	`Union[Feature, List[Dict[str, str]]]` Additional columns to be logged. Any duplicate columns will be ignored. TYPE: `Feature \| dict[str, str]` DEFAULT: `None`

Enable feature logging

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# enable logging
feature_view.enable_logging()

Enable feature logging and add extra log columns

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# enable logging with two extra log columns
feature_view.enable_logging(extra_log_columns=[{"name": "logging_col_1", "type": "string"},
                                               {"name": "logging_col_2", "type": "int"}])

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue

find_neighbors #

find_neighbors(
    embedding: list[int | float],
    feature: Feature | None = None,
    k: int | None = 10,
    filter: Filter | Logic | None = None,
    external: bool | None = None,
    return_type: Literal[
        "list", "polars", "pandas"
    ] = "list",
) -> list[list[Any]]

Finds the nearest neighbors for a given embedding in the vector database.

If filter is specified, or if embedding feature is stored in default project index, the number of results returned may be less than k. Try using a large value of k and extract the top k items from the results if needed.

Duplicate column error in Polars

If the feature view has duplicate column names, attempting to create a polars DataFrame will raise an error. To avoid this, set return_type to "list" or "pandas".

PARAMETER	DESCRIPTION
`embedding`	The target embedding for which neighbors are to be found. TYPE: `list[int \| float]`
`feature`	The feature used to compute similarity score. Required only if there are multiple embeddings. TYPE: `Feature \| None` DEFAULT: `None`
`k`	The number of nearest neighbors to retrieve. TYPE: `int \| None` DEFAULT: `10`
`filter`	A filter expression to restrict the search space. TYPE: `Filter \| Logic \| None` DEFAULT: `None`
`external`	If set to `True`, the connection to the online feature store is established using the same host as for the `host` parameter in the `hopsworks.login()` method. If set to `False`, the online feature store storage connector is used which relies on the private IP. Defaults to `True` if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to `False`. TYPE: `bool \| None` DEFAULT: `None`
`return_type`	The format in which to return the neighbors. TYPE: `Literal['list', 'polars', 'pandas']` DEFAULT: `'list'`

RETURNS	DESCRIPTION
`list[list[Any]]`	The nearest neighbor feature vectors.

Example

embedding_index = EmbeddingIndex()
embedding_index.add_embedding(name="user_vector", dimension=3)
fg = fs.create_feature_group(
            name='air_quality',
            embedding_index=embedding_index,
            version=1,
            primary_key=['id1'],
            online_enabled=True,
        )
fg.insert(data)
fv = fs.create_feature_view("air_quality", fg.select_all())
fv.find_neighbors(
    [0.1, 0.2, 0.3],
    k=5,
)

# apply filter
fg.find_neighbors(
    [0.1, 0.2, 0.3],
    k=5,
    feature=fg.user_vector,  # optional
    filter=(fg.id1 > 10) & (fg.id1 < 30)
)

from_response_json `classmethod` #

from_response_json(
    json_dict: dict[str, Any],
) -> FeatureView

Function that constructs the class object from its json serialization.

PARAMETER	DESCRIPTION
`json_dict`	`Dict[str, Any]`. Json serialized dictionary for the class. TYPE: `dict[str, Any]`

RETURNS	DESCRIPTION
`FeatureView`	`TransformationFunction`: Json deserialized class object.

get_alert #

get_alert(alert_id: int)

Get an alert for this feature view by ID.

PARAMETER	DESCRIPTION
`alert_id`	The id of the alert to get. TYPE: `int`

RETURNS	DESCRIPTION
	A single FeatureViewAlert object is returned.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_alerts #

get_alerts()

Get all alerts for this feature view.

RETURNS	DESCRIPTION
	List[FeatureViewAlert] or Alert.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_batch_data #

get_batch_data(
    start_time: str | int | datetime | date | None = None,
    end_time: str | int | datetime | date | None = None,
    read_options: dict[str, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    inference_helper_columns: bool = False,
    dataframe_type: Literal[
        "default",
        "spark",
        "pandas",
        "polars",
        "numpy",
        "python",
    ] = "default",
    transformed: bool | None = True,
    transformation_context: dict[str, Any] = None,
    logging_data: bool = False,
    **kwargs,
) -> (
    TrainingDatasetDataFrameTypes
    | HopsworksLoggingMetadataType
)

Get a batch of data from an event time interval from the offline feature store.

Batch data for the last 24 hours

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
import datetime
start_date = (datetime.datetime.now() - datetime.timedelta(hours=24))
end_date = (datetime.datetime.now())

# get a batch of data
df = feature_view.get_batch_data(
    start_time=start_date,
    end_time=end_date
)

Log Batch data for the last 24 hours

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
import datetime
start_date = (datetime.datetime.now() - datetime.timedelta(hours=24))
end_date = (datetime.datetime.now())

# get a batch of data
df = feature_view.get_batch_data(
    start_time=start_date,
    end_time=end_date,
    logging_data=True
)

# make predictions using the batch data
predictions = model.predict(df)

# log the batch data
feature_view.log(df, predictions=predictions)

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`start_time`	Start event time for the batch query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e., Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `None`
`end_time`	End event time for the batch query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e., Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `None`
`read_options`	User provided read options for python engine, defaults to `{}`: key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}`. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`primary_key`	Whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	Whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`inference_helper_columns`	Whether to include inference helper columns or not. Inference helper columns are a list of feature names in the feature view, defined during its creation, that may not be used in training the model itself but can be used during batch or online inference for extra information. If inference helper columns were not defined in the feature view `inference_helper_columns=True` will not any effect. Defaults to `False`, no helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	The type of the returned dataframe. Defaults to `"default"`, which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `Literal['default', 'spark', 'pandas', 'polars', 'numpy', 'python']` DEFAULT: `'default'`
`transformed`	Setting to `False` returns the untransformed feature vectors. TYPE: `bool \| None` DEFAULT: `True`
`transformation_context`	A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`
`logging_data`	Setting this to `True` return batch data with logging metadata. The batch data will contain only contain the required features. The logging metadata is available as part of an additional attribute `hopsworks_logging_metadata` of the returned object. The logging metadata contains the untransformed features, transformed features, inference helpers, serving keys, request parameters and event time. The batch data object returned can be passed to `feature_view.log()` to log the feature vectors along with all the logging metadata. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame`	The spark dataframe containing the feature data. TYPE: `TrainingDatasetDataFrameTypes \| HopsworksLoggingMetadataType`
`TrainingDatasetDataFrameTypes \| HopsworksLoggingMetadataType`	pyspark.DataFrame: A Spark DataFrame.
`TrainingDatasetDataFrameTypes \| HopsworksLoggingMetadataType`	pandas.DataFrame: A Pandas DataFrame.
`TrainingDatasetDataFrameTypes \| HopsworksLoggingMetadataType`	polars.DataFrame: A Polars DataFrame.
`TrainingDatasetDataFrameTypes \| HopsworksLoggingMetadataType`	numpy.ndarray: A two-dimensional Numpy array.
`list`	A two-dimensional Python list. TYPE: `TrainingDatasetDataFrameTypes \| HopsworksLoggingMetadataType`

get_batch_query #

get_batch_query(
    start_time: str | int | datetime | date | None = None,
    end_time: str | int | datetime | date | None = None,
) -> str

Get a query string of the batch query.

Batch query for the last 24 hours

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
import datetime
start_date = (datetime.datetime.now() - datetime.timedelta(hours=24))
end_date = (datetime.datetime.now())

# get a query string of batch query
query_str = feature_view.get_batch_query(
    start_time=start_date,
    end_time=end_date
)
# print query string
print(query_str)

PARAMETER	DESCRIPTION
`start_time`	Start event time for the batch query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e., Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `None`
`end_time`	End event time for the batch query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e., Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	The batch query.

get_feature_monitoring_configs #

get_feature_monitoring_configs(
    name: str | None = None,
    feature_name: str | None = None,
    config_id: int | None = None,
) -> (
    fmc.FeatureMonitoringConfig
    | list[fmc.FeatureMonitoringConfig]
    | None
)

Fetch feature monitoring configs attached to the feature view.

If no arguments is provided the method will return all feature monitoring configs attached to the feature view, meaning all feature monitoring configs that are attach to a feature in the feature view. If you wish to fetch a single config, provide the its name. If you wish to fetch all configs attached to a particular feature, provide the feature name.

Example

# fetch your feature view
fv = fs.get_feature_view(name="my_feature_view", version=1)
# fetch all feature monitoring configs attached to the feature view
fm_configs = fv.get_feature_monitoring_configs()
# fetch a single feature monitoring config by name
fm_config = fv.get_feature_monitoring_configs(name="my_config")
# fetch all feature monitoring configs attached to a particular feature
fm_configs = fv.get_feature_monitoring_configs(feature_name="my_feature")
# fetch a single feature monitoring config with a particular id
fm_config = fv.get_feature_monitoring_configs(config_id=1)

PARAMETER	DESCRIPTION
`name`	If provided fetch only the feature monitoring config with the given name. Defaults to None. TYPE: `str \| None` DEFAULT: `None`
`feature_name`	If provided, fetch only configs attached to a particular feature. Defaults to None. TYPE: `str \| None` DEFAULT: `None`
`config_id`	If provided, fetch only the feature monitoring config with the given id. Defaults to None. TYPE: `int \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request
`hopsworks.client.exceptions.FeatureStoreException`	If the feature view is not registered in Hopsworks
`ValueError`	if both name and feature_name are provided.
`TypeError`	if name or feature_name are not string or None.

RETURNS	DESCRIPTION
`fmc.FeatureMonitoringConfig \| list[fmc.FeatureMonitoringConfig] \| None`	Union[`FeatureMonitoringConfig`, List[`FeatureMonitoringConfig`], None] A list of feature monitoring configs. If name provided, returns either a single config or None if not found.

get_feature_monitoring_history #

get_feature_monitoring_history(
    config_name: str | None = None,
    config_id: int | None = None,
    start_time: int | str | datetime | date | None = None,
    end_time: int | str | datetime | date | None = None,
    with_statistics: bool | None = True,
) -> list[fmr.FeatureMonitoringResult]

Fetch feature monitoring history for a given feature monitoring config.

Example

# fetch your feature view
fv = fs.get_feature_view(name="my_feature_group", version=1)
# fetch feature monitoring history for a given feature monitoring config
fm_history = fv.get_feature_monitoring_history(
    config_name="my_config",
    start_time="2020-01-01",
)
# or use the config id
fm_history = fv.get_feature_monitoring_history(
    config_id=1,
    start_time=datetime.now() - timedelta(weeks=2),
    end_time=datetime.now() - timedelta(weeks=1),
    with_statistics=False,
)

PARAMETER	DESCRIPTION
`config_name`	The name of the feature monitoring config to fetch history for. Defaults to None. TYPE: `str \| None` DEFAULT: `None`
`config_id`	The id of the feature monitoring config to fetch history for. Defaults to None. TYPE: `int \| None` DEFAULT: `None`
`start_date`	The start date of the feature monitoring history to fetch. Defaults to None.
`end_date`	The end date of the feature monitoring history to fetch. Defaults to None.
`with_statistics`	Whether to include statistics in the feature monitoring history. Defaults to True. If False, only metadata about the monitoring will be fetched. TYPE: `bool \| None` DEFAULT: `True`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	In case the backend encounters an issue
`hopsworks.client.exceptions.FeatureStoreException`	If the feature view is not registered in Hopsworks
`ValueError`	if both config_name and config_id are provided.
`TypeError`	if config_name or config_id are not respectively string, int or None.

RETURNS	DESCRIPTION
`list[fmr.FeatureMonitoringResult]`	List[`FeatureMonitoringResult`] A list of feature monitoring results containing the monitoring metadata as well as the computed statistics for the detection and reference window if requested.

get_feature_vector #

get_feature_vector(
    entry: dict[str, Any] | None = None,
    passed_features: dict[str, Any] | None = None,
    external: bool | None = None,
    return_type: Literal[
        "list", "polars", "numpy", "pandas"
    ] = "list",
    allow_missing: bool = False,
    force_rest_client: bool = False,
    force_sql_client: bool = False,
    transform: bool | None = True,
    on_demand_features: bool | None = True,
    request_parameters: dict[str, Any] | None = None,
    transformation_context: dict[str, Any] = None,
    logging_data: bool = False,
) -> (
    list[Any]
    | pd.DataFrame
    | np.ndarray
    | pl.DataFrame
    | HopsworksLoggingMetadataType
)

Returns assembled feature vector from online feature store.

Call feature_view.init_serving before this method if the following configurations are needed:

The training dataset version of the transformation statistics.
Additional configurations of online serving engine.

Missing primary key entries

If the provided primary key entry can't be found in one or more of the feature groups used by this feature view the call to this method will raise an exception. Alternatively, setting allow_missing to True returns a feature vector with missing values.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get assembled serving vector as a python list
feature_view.get_feature_vector(
    entry = {"pk1": 1, "pk2": 2}
)

# get assembled serving vector as a pandas dataframe
feature_view.get_feature_vector(
    entry = {"pk1": 1, "pk2": 2},
    return_type = "pandas"
)

# get assembled serving vector as a numpy array
feature_view.get_feature_vector(
    entry = {"pk1": 1, "pk2": 2},
    return_type = "numpy"
)

Get feature vector with user-supplied features

# get feature store instance
fs = ...
# get feature view instance
feature_view = fs.get_feature_view(...)

# the application provides a feature value 'app_attr'
app_attr = ...

# get a feature vector
feature_view.get_feature_vector(
    entry = {"pk1": 1, "pk2": 2},
    passed_features = { "app_feature" : app_attr }
)

Logging feature vector

# get feature store instance
fs = ...
# get feature view instance
feature_view = fs.get_feature_view(...)

# the application provides a feature value 'app_attr'
app_attr = ...

# get a feature vector
feature_vector = feature_view.get_feature_vector(
    entry = {"pk1": 1, "pk2": 2},
    passed_features = { "app_feature" : app_attr },
    logging_data = True
)

# make predictions using the feature vector
predictions = model.predict(feature_vector)

# log the feature vector
feature_view.log(feature_vector, predictions=predictions)

PARAMETER	DESCRIPTION
`entry`	Dictionary of feature group primary key and values provided by serving application. Set of required primary keys is `feature_view.primary_keys`. If the required primary keys is not provided, it will look for name of the primary key in feature group in the entry. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`passed_features`	Dictionary of feature values provided by the application at runtime. They can replace features values fetched from the feature store as well as providing feature values which are not available in the feature store. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`external`	If set to `True`, the connection to the online feature store is established using the same host as for the `host` parameter in the `hopsworks.login()` method. If set to `False`, the online feature store storage connector is used which relies on the private IP. Defaults to `True` if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to `False`. TYPE: `bool \| None` DEFAULT: `None`
`return_type`	In which format to return the feature vector. TYPE: `Literal['list', 'polars', 'numpy', 'pandas']` DEFAULT: `'list'`
`force_rest_client`	If set to True, reads from online feature store using the REST client if initialised. TYPE: `bool` DEFAULT: `False`
`force_sql_client`	If set to True, reads from online feature store using the SQL client if initialised. TYPE: `bool` DEFAULT: `False`
`allow_missing`	Setting to `True` returns feature vectors with missing values. TYPE: `bool` DEFAULT: `False`
`transform`	If set to `True`, model-dependent transformations are applied to the feature vector, and `on_demand_feature` is automatically set to `True`, ensuring the inclusion of on-demand features. If set to `False`, the function returns the feature vector without applying any model-dependent transformations. TYPE: `bool \| None` DEFAULT: `True`
`on_demand_features`	Setting this to `False` returns untransformed feature vectors without any on-demand features. TYPE: `bool \| None` DEFAULT: `True`
`request_parameters`	Request parameters required by on-demand transformation functions to compute on-demand features present in the feature view. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`transformation_context`	A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`
`logging_data`	Setting this to `True` return feature vector with logging metadata. The feature vector will contain only the required features. The logging metadata is available as part of an additional attribute `hopsworks_logging_metadata` of the returned object. The logging metadata contains the untransformed features, transformed features, inference helpers, serving keys, request parameters and event time. The feature vector object returned can be passed to `feature_view.log()` to log the feature vector along with all the logging metadata. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`list[Any] \| pd.DataFrame \| np.ndarray \| pl.DataFrame \| HopsworksLoggingMetadataType`	Returned `list`, `pd.DataFrame`, `polars.DataFrame` or `np.ndarray` (the exact type dependends on `return_type`) contains feature values related to provided primary keys, ordered according to positions of this features in the feature view query.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.FeatureStoreException`	When primary key entry cannot be found in one or more of the feature groups used by this feature view.

get_feature_vectors #

get_feature_vectors(
    entry: list[dict[str, Any]] | None = None,
    passed_features: list[dict[str, Any]] | None = None,
    external: bool | None = None,
    return_type: Literal[
        "list", "polars", "numpy", "pandas"
    ] = "list",
    allow_missing: bool = False,
    force_rest_client: bool = False,
    force_sql_client: bool = False,
    transform: bool | None = True,
    on_demand_features: bool | None = True,
    request_parameters: list[dict[str, Any]] | None = None,
    transformation_context: dict[str, Any] = None,
    logging_data: bool = False,
) -> (
    list[list[Any]]
    | pd.DataFrame
    | np.ndarray
    | pl.DataFrame
    | HopsworksLoggingMetadataType
)

Returns assembled feature vectors in batches from online feature store.

Call feature_view.init_serving before this method if the following configurations are needed.

The training dataset version of the transformation statistics.
Additional configurations of online serving engine.

Missing primary key entries

If any of the provided primary key elements in entry can't be found in any of the feature groups, no feature vector for that primary key value will be returned. If it can be found in at least one but not all feature groups used by this feature view the call to this method will raise an exception. Alternatively, setting allow_missing to True returns feature vectors with missing values.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get assembled serving vectors as a python list of lists
feature_view.get_feature_vectors(
    entry = [
        {"pk1": 1, "pk2": 2},
        {"pk1": 3, "pk2": 4},
        {"pk1": 5, "pk2": 6}
    ]
)

# get assembled serving vectors as a pandas dataframe
feature_view.get_feature_vectors(
    entry = [
        {"pk1": 1, "pk2": 2},
        {"pk1": 3, "pk2": 4},
        {"pk1": 5, "pk2": 6}
    ],
    return_type = "pandas"
)

# get assembled serving vectors as a numpy array
feature_view.get_feature_vectors(
    entry = [
        {"pk1": 1, "pk2": 2},
        {"pk1": 3, "pk2": 4},
        {"pk1": 5, "pk2": 6}
    ],
    return_type = "numpy"
)

Logging feature vectors

# get feature store instance
fs = ...
# get feature view instance
feature_view = fs.get_feature_view(...)

# the application provides a feature value 'app_attr'
app_attr = ...

# get a feature vectors
feature_vectors = feature_view.get_feature_vectors(
    entry = [
        {"pk1": 1, "pk2": 2},
        {"pk1": 3, "pk2": 4},
        {"pk1": 5, "pk2": 6}
    ],
    logging_data = True
)

# make predictions using the feature vectors
predictions = model.predict(feature_vectors)

# log the feature vectors
feature_view.log(feature_vectors, predictions=predictions)

PARAMETER	DESCRIPTION
`entry`	A list of dictionary of feature group primary key and values provided by serving application. Set of required primary keys is `feature_view.primary_keys`. If the required primary keys is not provided, it will look for name of the primary key in feature group in the entry. TYPE: `list[dict[str, Any]] \| None` DEFAULT: `None`
`passed_features`	A list of dictionary of feature values provided by the application at runtime. They can replace features values fetched from the feature store as well as providing feature values which are not available in the feature store. TYPE: `list[dict[str, Any]] \| None` DEFAULT: `None`
`external`	If set to `True`, the connection to the online feature store is established using the same host as for the `host` parameter in the `hopsworks.login()` method. If set to `False`, the online feature store storage connector is used which relies on the private IP. Defaults to `True` if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to `False`. TYPE: `bool \| None` DEFAULT: `None`
`return_type`	The format in which to return the feature vectors. TYPE: `Literal['list', 'polars', 'numpy', 'pandas']` DEFAULT: `'list'`
`force_sql_client`	If set to `True`, reads from online feature store using the SQL client if initialised. TYPE: `bool` DEFAULT: `False`
`force_rest_client`	If set to `True`, reads from online feature store using the REST client if initialised. TYPE: `bool` DEFAULT: `False`
`allow_missing`	Setting to `True` returns feature vectors with missing values. TYPE: `bool` DEFAULT: `False`
`transform`	If set to `True`, model-dependent transformations are applied to the feature vector, and `on_demand_feature` is automatically set to `True`, ensuring the inclusion of on-demand features. If set to `False`, the function returns the feature vector without applying any model-dependent transformations. TYPE: `bool \| None` DEFAULT: `True`
`on_demand_features`	Setting this to `False` returns untransformed feature vectors without any on-demand features. TYPE: `bool \| None` DEFAULT: `True`
`request_parameters`	Request parameters required by on-demand transformation functions to compute on-demand features present in the feature view. TYPE: `list[dict[str, Any]] \| None` DEFAULT: `None`
`transformation_context`	A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`
`logging_data`	Setting this to `True` return feature vector with logging metadata. The feature vectors will contain only contain the required features. The logging metadata is available as part of an additional attribute `hopsworks_logging_metadata` of the returned object. The logging metadata contains the untransformed features, transformed features, inference helpers, serving keys, request parameters and event time. The feature vector object returned can be passed to `feature_view.log()` to log the feature vectors along with all the logging metadata. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`list[list[Any]] \| pd.DataFrame \| np.ndarray \| pl.DataFrame \| HopsworksLoggingMetadataType`	Returned `List[list]`, `pd.DataFrame`, `polars.DataFrame` or `np.ndarray` (depending on the `return_type`) contains feature values related to provided primary keys, ordered according to positions of this features in the feature view query.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.FeatureStoreException`	When primary key entry cannot be found in one or more of the feature groups used by this feature view.

get_inference_helper #

get_inference_helper(
    entry: dict[str, Any],
    external: bool | None = None,
    return_type: Literal[
        "pandas", "dict", "polars"
    ] = "pandas",
    force_rest_client: bool = False,
    force_sql_client: bool = False,
) -> pd.DataFrame | pl.DataFrame | dict[str, Any]

Returns assembled inference helper column vectors from online feature store.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get assembled inference helper column vector
feature_view.get_inference_helper(
    entry = {"pk1": 1, "pk2": 2}
)

PARAMETER	DESCRIPTION
`entry`	Dictionary of feature group primary key and values provided by serving application. Set of required primary keys is `feature_view.primary_keys`. TYPE: `dict[str, Any]`
`external`	If set to `True`, the connection to the online feature store is established using the same host as for the `host` parameter in the `hopsworks.login()` method. If set to `False`, the online feature store storage connector is used which relies on the private IP. Defaults to `True` if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to `False`. TYPE: `bool \| None` DEFAULT: `None`
`return_type`	The format in which to return the dataframe. TYPE: `Literal['pandas', 'dict', 'polars']` DEFAULT: `'pandas'`

RETURNS	DESCRIPTION
`pd.DataFrame \| pl.DataFrame \| dict[str, Any]`	The dataframe.

RAISES	DESCRIPTION
`Exception`	When primary key entry cannot be found in one or more of the feature groups used by this feature view.

get_inference_helpers #

get_inference_helpers(
    entry: list[dict[str, Any]],
    external: bool | None = None,
    return_type: Literal[
        "pandas", "dict", "polars"
    ] = "pandas",
    force_sql_client: bool = False,
    force_rest_client: bool = False,
) -> list[dict[str, Any]] | pd.DataFrame | pl.DataFrame

Returns assembled inference helper column vectors in batches from online feature store.

Missing primary key entries

If any of the provided primary key elements in entry can't be found in any of the feature groups, no inference helper column vectors for that primary key value will be returned. If it can be found in at least one but not all feature groups used by this feature view the call to this method will raise an exception.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get assembled inference helper column vectors
feature_view.get_inference_helpers(
    entry = [
        {"pk1": 1, "pk2": 2},
        {"pk1": 3, "pk2": 4},
        {"pk1": 5, "pk2": 6}
    ]
)

PARAMETER	DESCRIPTION
`entry`	A list of dictionary of feature group primary key and values provided by serving application. Set of required primary keys is `feature_view.primary_keys`. TYPE: `list[dict[str, Any]]`
`external`	If set to `True`, the connection to the online feature store is established using the same host as for the `host` parameter in the `hopsworks.login()` method. If set to `False`, the online feature store storage connector is used which relies on the private IP. Defaults to `True` if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to `False`. TYPE: `bool \| None` DEFAULT: `None`
`return_type`	The format in which to return the dataframes. TYPE: `Literal['pandas', 'dict', 'polars']` DEFAULT: `'pandas'`

RETURNS	DESCRIPTION
`list[dict[str, Any]] \| pd.DataFrame \| pl.DataFrame`	Returned `pd.DataFrame`, `polars.DataFrame` or `List[dict]` (depending on `return_type`) contains feature values related to provided primary keys, ordered according to positions of this features in the feature view query.

RAISES	DESCRIPTION
`Exception`	When primary key entry cannot be found in one or more of the feature groups used by this feature view.

get_last_accessed_training_dataset #

get_last_accessed_training_dataset()

Get the last accessed training dataset version used for this feature view.

Note

The value does not take into account other connections to Hopsworks. If multiple clients do training datasets operations, each will have its own view of the last accessed dataset. Also, the last accessed training dataset is not necessarily the newest one with the highest version.

get_log_timeline #

get_log_timeline(
    wallclock_time: str
    | int
    | datetime
    | datetime.date
    | None = None,
    limit: int | None = None,
    transformed: bool | None = False,
) -> dict[str, dict[str, str]]

Retrieve the log timeline for the current feature view.

PARAMETER	DESCRIPTION
`wallclock_time`	Specific time to get the log timeline for. Can be a string, integer, datetime, or date. Defaults to None. TYPE: `str \| int \| datetime \| datetime.date \| None` DEFAULT: `None`
`limit`	Maximum number of entries to retrieve. Defaults to None. TYPE: `int \| None` DEFAULT: `None`
`transformed`	Whether to include transformed logs. Defaults to False. TYPE: `bool \| None` DEFAULT: `False`

Example

# get log timeline
log_timeline = feature_view.get_log_timeline(limit=10)

RETURNS	DESCRIPTION
`dict[str, dict[str, str]]`	`Dict[str, Dict[str, str]]`. Dictionary object of commit metadata timeline, where Key is commit id and value
`dict[str, dict[str, str]]`	is `Dict[str, str]` with key value pairs of date committed on, number of rows updated, inserted and deleted.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to retrieve the log timeline.

get_models #

get_models(
    training_dataset_version: int | None = None,
) -> list[Model]

Get the generated models using this feature view, based on explicit provenance.

Only the accessible models are returned. For more items use the base method, see get_models_provenance.

PARAMETER	DESCRIPTION
`training_dataset_version`	Filter generated models based on the used training dataset version. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[Model]`	List of models.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_models_provenance #

get_models_provenance(
    training_dataset_version: int | None = None,
) -> explicit_provenance.Links

Get the generated models using this feature view, based on explicit provenance.

These models can be accessible or inaccessible. Explicit provenance does not track deleted generated model links, so deleted will always be empty. For inaccessible models, only a minimal information is returned.

PARAMETER	DESCRIPTION
`training_dataset_version`	Filter generated models based on the used training dataset version. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`explicit_provenance.Links`	Object containing the section of provenance graph requested or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_newest_model #

get_newest_model(
    training_dataset_version: int | None = None,
) -> Model | None

Get the latest generated model using this feature view, based on explicit provenance.

Search only through the accessible models. For more items use the base method, see get_models_provenance.

PARAMETER	DESCRIPTION
`training_dataset_version`	Filter generated models based on the used training dataset version. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Model \| None`	Newest Generated Model or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_parent_feature_groups #

get_parent_feature_groups() -> (
    explicit_provenance.Links | None
)

Get the parents of this feature view, based on explicit provenance.

Parents are feature groups or external feature groups. These feature groups can be accessible, deleted or inaccessible. For deleted and inaccessible feature groups, only minimal information is returned.

RETURNS	DESCRIPTION
`explicit_provenance.Links \| None`	`Links`: Object containing the section of provenance graph requested or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

get_tag #

get_tag(name: str) -> tag.Tag | None

Get the tags of a feature view.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get a tag of a feature view
name = feature_view.get_tag('tag_name')

PARAMETER	DESCRIPTION
`name`	Name of the tag to get. TYPE: `str`

RETURNS	DESCRIPTION
`tag.Tag \| None`	Tag value or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_tags #

get_tags() -> dict[str, tag.Tag]

Returns all tags attached to a feature view.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get tags
list_tags = feature_view.get_tags()

RETURNS	DESCRIPTION
`dict[str, tag.Tag]`	The dictionary of tags.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

get_train_test_split #

get_train_test_split(
    training_dataset_version: int,
    read_options: dict[Any, Any] | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    training_helper_columns: bool = False,
    dataframe_type: str | None = "default",
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes | None,
    TrainingDatasetDataFrameTypes | None,
]

Get training data created by feature_view.create_train_test_split or feature_view.train_test_split.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training data
X_train, X_test, y_train, y_test = feature_view.get_train_test_split(training_dataset_version=1)

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`read_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. For python engine: * key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}` Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`primary_key`	whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`training_helper_columns`	whether to include training helper columns or not. Training helper columns are a list of feature names in the feature view, defined during its creation, that are not the part of the model schema itself but can be used during training as a helper for extra information. If training helper columns were not defined in the feature view or during materializing training dataset in the file system then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	str, optional. The type of the returned dataframe. Possible values are `"default"`, `"spark"`,`"pandas"`, `"polars"`, `"numpy"` or `"python"`. Defaults to "default", which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `str \| None` DEFAULT: `'default'`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`(X_train, X_test, y_train, y_test)`	Tuple of dataframe of features and labels

get_train_validation_test_split #

get_train_validation_test_split(
    training_dataset_version: int,
    read_options: dict[str, Any] | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    training_helper_columns: bool = False,
    dataframe_type: str = "default",
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes | None,
    TrainingDatasetDataFrameTypes | None,
    TrainingDatasetDataFrameTypes | None,
]

Get training data created by feature_view.create_train_validation_test_split or feature_view.train_validation_test_split.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training data
X_train, X_val, X_test, y_train, y_val, y_test = feature_view.get_train_validation_test_splits(training_dataset_version=1)

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`read_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. For python engine: * key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}` Defaults to `{}`. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`primary_key`	whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`training_helper_columns`	whether to include training helper columns or not. Training helper columns are a list of feature names in the feature view, defined during its creation, that are not the part of the model schema itself but can be used during training as a helper for extra information. If training helper columns were not defined in the feature view or during materializing training dataset in the file system then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	str, optional. The type of the returned dataframe. Possible values are `"default"`, `"spark"`,`"pandas"`, `"polars"`, `"numpy"` or `"python"`. Defaults to "default", which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `str` DEFAULT: `'default'`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`(X_train, X_val, X_test, y_train, y_val, y_test)`	Tuple of dataframe of features and labels

get_training_data #

get_training_data(
    training_dataset_version: int,
    read_options: dict[str, Any] | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    training_helper_columns: bool = False,
    dataframe_type: str | None = "default",
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes | None,
]

Get training data created by feature_view.create_training_data or feature_view.training_data.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training data
features_df, labels_df = feature_view.get_training_data(training_dataset_version=1)

External Storage Support

Reading training data that was written to external storage using a Storage Connector other than S3 can currently not be read using HSFS APIs with Python as Engine, instead you will have to use the storage's native client.

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`read_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. For python engine: * key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}` Defaults to `{}`. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`primary_key`	whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`training_helper_columns`	whether to include training helper columns or not. Training helper columns are a list of feature names in the feature view, defined during its creation, that are not the part of the model schema itself but can be used during training as a helper for extra information. If training helper columns were not defined in the feature view or during materializing training dataset in the file system then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	str, optional. The type of the returned dataframe. Possible values are `"default"`, `"spark"`,`"pandas"`, `"polars"`, `"numpy"` or `"python"`. Defaults to "default", which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `str \| None` DEFAULT: `'default'`

RETURNS	DESCRIPTION
`(X, y)`	Tuple of dataframe of features and labels

get_training_dataset_schema #

get_training_dataset_schema(
    training_dataset_version: int | None = None,
) -> list[training_dataset_feature.TrainingDatasetFeature]

Function that returns the schema of the training dataset that is generated from a feature view.

It provides the schema of the features after all transformation functions have been applied.

PARAMETER	DESCRIPTION
`training_dataset_version`	Specifies the version of the training dataset for which the schema should be generated. By default, this is set to None. However, if the `one_hot_encoder` transformation function is used, the training dataset version must be provided. This is because the schema will then depend on the statistics of the training data used. TYPE: `int \| None` DEFAULT: `None`

Example

schema = feature_view.get_training_dataset_schema(training_dataset_version=1)

RETURNS	DESCRIPTION
`list[training_dataset_feature.TrainingDatasetFeature]`	`List[training_dataset_feature.TrainingDatasetFeature]`: List of training dataset features objects.

get_training_dataset_statistics #

get_training_dataset_statistics(
    training_dataset_version: int,
    before_transformation: bool = False,
    feature_names: list[str] | None = None,
) -> Statistics

Get statistics of a training dataset.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training dataset statistics
statistics = feature_view.get_training_dataset_statistics(training_dataset_version=1)

PARAMETER	DESCRIPTION
`training_dataset_version`	Training dataset version TYPE: `int`
`before_transformation`	Whether the statistics were computed before transformation functions or not. TYPE: `bool` DEFAULT: `False`
`feature_names`	List of feature names of which statistics are retrieved. TYPE: `list[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Statistics`	`Statistics`

get_training_dataset_tag #

get_training_dataset_tag(
    training_dataset_version: int, name: str
) -> tag.Tag | None

Get the tags of a training dataset.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get a training dataset tag
tag_str = feature_view.get_training_dataset_tag(
    training_dataset_version=1,
     name="tag_schema"
)

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`name`	Name of the tag to get. TYPE: `str`

RETURNS	DESCRIPTION
`tag.Tag \| None`	tag value or `None` if it does not exist.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

get_training_dataset_tags #

get_training_dataset_tags(
    training_dataset_version: int,
) -> dict[str, tag.Tag]

Returns all tags attached to a training dataset.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get a training dataset tags
list_tags = feature_view.get_training_dataset_tags(
    training_dataset_version=1
)

RETURNS	DESCRIPTION
`dict[str, tag.Tag]`	`Dict[str, obj]` of tags.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

get_training_datasets #

get_training_datasets() -> list[
    training_dataset.TrainingDatasetBase
]

Returns the metadata of all training datasets created with this feature view.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get all training dataset metadata
list_tds_meta = feature_view.get_training_datasets()

RETURNS	DESCRIPTION
`list[training_dataset.TrainingDatasetBase]`	`List[TrainingDatasetBase]` List of training datasets metadata.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

init_batch_scoring #

init_batch_scoring(
    training_dataset_version: int | None = None,
) -> None

Initialise feature view to retrieve feature vector from offline feature store.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# initialise feature view to retrieve feature vector from offline feature store
feature_view.init_batch_scoring(training_dataset_version=1)

# get batch data
batch_data = feature_view.get_batch_data(...)

PARAMETER	DESCRIPTION
`training_dataset_version`	Transformation statistics are fetched from training dataset and applied to the feature vector. TYPE: `int \| None` DEFAULT: `None`

init_feature_logger #

init_feature_logger(feature_logger: FeatureLogger) -> None

Initialize the feature logger.

PARAMETER	DESCRIPTION
`feature_logger`	The logger to be used for logging features. TYPE: `FeatureLogger`

init_serving #

init_serving(
    training_dataset_version: int | None = None,
    external: bool | None = None,
    options: dict[str, Any] | None = None,
    init_sql_client: bool | None = None,
    init_rest_client: bool = False,
    reset_rest_client: bool = False,
    config_rest_client: dict[str, Any] | None = None,
    default_client: Literal["sql", "rest"] | None = None,
    feature_logger: FeatureLogger | None = None,
    **kwargs,
) -> None

Initialise feature view to retrieve feature vector from online and offline feature store.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# initialise feature view to retrieve a feature vector
feature_view.init_serving(training_dataset_version=1)

PARAMETER	DESCRIPTION
`training_dataset_version`	Transformation statistics are fetched from training dataset and applied to the feature vector. Defaults to 1 for online feature store. TYPE: `int \| None` DEFAULT: `None`
`external`	If set to `True`, the connection to the online feature store is established using the same host as for the `host` parameter in the `hopsworks.login()` method. If set to `False`, the online feature store storage connector is used which relies on the private IP. Defaults to `True` if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to `False`. TYPE: `bool \| None` DEFAULT: `None`
`init_sql_client`	If set to `True`, this ensure the online store sql client is initialised, otherwise if init_rest_client is set to true it will skip initialising the sql client. By default the sql client is initialised if no client is specified to match legacy behaviour. TYPE: `bool \| None` DEFAULT: `None`
`init_rest_client`	If set to `True`, this ensure the online store rest client is initialised. Pass additional configuration options via the rest_config parameter. Set reset_rest_client to `True` to reset the rest client. By default the rest client is not initialised. TYPE: `bool` DEFAULT: `False`
`default_client`	Which client to default to if both are initialised. TYPE: `Literal['sql', 'rest'] \| None` DEFAULT: `None`
`options`	Additional options as key/value pairs for configuring online serving engine. key: kwargs of SqlAlchemy engine creation (See: https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine). For example: `{"pool_size": 10}`. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`reset_rest_client`	If set to `True`, the rest client will be reset and reinitialised with provided configuration. TYPE: `bool` DEFAULT: `False`
`config_rest_client`	Additional configuration options for the rest client. If the client is already initialised, this will be ignored. Options include: `host`: string, optional. The host of the online store. Dynamically set if not provided. `port`: int, optional. The port of the online store. Defaults to 4406. `verify_certs`: boolean, optional. Verify the certificates of the online store server. Defaults to True. `api_key`: string, optional. The API key to authenticate with the online store. The API key must be provided if initialising the rest client in an internal environment. `timeout`: int, optional. The timeout for the rest client in seconds. Defaults to 2. `use_ssl`: boolean, optional. Use SSL to connect to the online store. Defaults to True. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`feature_logger`	Custom feature logger which `feature_view.log()` uses to log feature vectors. If provided, feature vectors will not be inserted to logging feature group automatically when `feature_view.log()` is called. TYPE: `FeatureLogger \| None` DEFAULT: `None`

json #

json() -> str

Convert class into its json serialized form.

RETURNS	DESCRIPTION
`str`	`str`: Json serialized object.

log #

log(
    logging_data: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    untransformed_features: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame")
    | None = None,
    predictions: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | None = None,
    transformed_features: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    inference_helper_columns: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    request_parameters: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    event_time: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    serving_keys: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    extra_logging_features: pd.DataFrame
    | pl.DataFrame
    | list[list[Any]]
    | list[dict[str, Any]]
    | np.ndarray
    | TypeVar("pyspark.sql.DataFrame") = None,
    request_id: str | list[str] = None,
    write_options: dict[str, Any] | None = None,
    training_dataset_version: int | None = None,
    model: Model = None,
    model_name: str | None = None,
    model_version: int | None = None,
) -> list[Job] | None

Log features and optionally predictions for the current feature view. The logged features are written periodically to the offline store. If you need it to be available immediately, call materialize_log.

If features is a pyspark.Dataframe, prediction needs to be provided as columns in the dataframe,

values in predictions will be ignored.

PARAMETER	DESCRIPTION
`logging_dataframe`	The features to be logged, this can contain both transformed features, untransfored features and predictions. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray.
`untransformed_features`	The untransformed features to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame') \| None` DEFAULT: `None`
`prediction`	The predictions to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list, a list of lists, or a numpy ndarray.
`transformed_features`	The transformed features to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame')` DEFAULT: `None`
`inference_helper_columns`	The inference helper columns to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame')` DEFAULT: `None`
`request_parameters`	The request parameters to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame')` DEFAULT: `None`
`event_time`	The event time to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame')` DEFAULT: `None`
`serving_keys`	The serving keys to be logged. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame')` DEFAULT: `None`
`extra_logging_features`	Extra features to be logged. The features must be specified when enabled logging or while creating the feature view. Can be a pandas DataFrame, polar DataFrame, or spark DataFrame, a list of lists, a list of dictionaries or a numpy ndarray. TYPE: `pd.DataFrame \| pl.DataFrame \| list[list[Any]] \| list[dict[str, Any]] \| np.ndarray \| TypeVar('pyspark.sql.DataFrame')` DEFAULT: `None`
`request_id`	The request ID that can be used to identify an online inference request. TYPE: `str \| list[str]` DEFAULT: `None`
`write_options`	Options for writing the log. Defaults to None. TYPE: `dict[str, Any] \| None` DEFAULT: `None`
`training_dataset_version`	Version of the training dataset. If training dataset version is definied in `init_serving` or `init_batch_scoring`, or model has training dataset version, or training dataset version was cached, then the version will be used, otherwise defaults to None. TYPE: `int \| None` DEFAULT: `None`
`model`	`Union[str, hsml.model.Model]` Hopsworks model associated with the log. Defaults to None. TYPE: `Model` DEFAULT: `None`
`model_name`	`Optional[str]`. Name of the model to be associated with the log. If `model` is provided, this parameter will be ignored. TYPE: `str \| None` DEFAULT: `None`
`model_version`	`Optional[int]`. Version of the model to be associated with the log. If `model` is provided, this parameter will be ignored. TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[Job] \| None`	`list[Job]` job information for feature insertion if python engine is used

Implicitly Logging Batch Data and Predictions with all Logging metadata

df = fv.get_batch_data(logging_data=True)
predictions = model.predict(df)

# log passed features
feature_view.log(df, predictions=predictions)

Implicitly Logging Feature Vectors and Predictions with all Logging metadata

feature_vector = fv.get_feature_vector({"pk": 1}, logging_data=True)
predictions = model.predict(feature_vector)

# log passed features
feature_view.log(feature_vector, predictions=predictions)

Logging DataFrames with Predictions

df = fv.get_batch_data()
predictions = model.predict(df)

# log passed features
feature_view.log(df, predictions=predictions)

Explicit Logging of untransformed and transformed Features

serving_keys = [{"pk": 1}]
untransformed_feature_vector = fv.get_feature_vectors({"pk": 1})
transformed_feature_vector = fv.transform(untransformed_feature_vector)
predictions = model.predict(transformed_feature_vector)

# log both untransformed and transformed features
feature_view.log(
    untransformed_features=untransformed_feature_vector,
    transformed_features=transformed_feature_vector,
    servings_keys=serving_keys,
    predictions=predictions
)

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to log features.

materialize_log #

materialize_log(
    wait: bool = False, transformed: bool | None = None
) -> list[Job]

Materialize the log for the current feature view.

PARAMETER	DESCRIPTION
`wait`	Whether to wait for the materialization to complete. Defaults to False. TYPE: `bool` DEFAULT: `False`
`transformed`	Whether to materialize transformed or untrasformed logs. Defaults to None, in which case the returned list contains a job for materialization of transformed features and then a job for untransformed features. Otherwise the list contains only transformed jobs if transformed is True and untransformed jobs if it is False. TYPE: `bool \| None` DEFAULT: `None`

Example

# materialize log
materialization_result = feature_view.materialize_log(wait=True)

RETURNS	DESCRIPTION
`list[Job]`	List[`Job`] Job information for the materialization jobs of transformed and untransformed features.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to materialize the log.

pause_logging #

pause_logging() -> None

Pause scheduled materialization job for the current feature view.

Example

# pause logging
feature_view.pause_logging()

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to pause feature logging.

purge_all_training_data #

purge_all_training_data() -> None

Delete all training datasets (data only).

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# purge all training data
feature_view.purge_all_training_data()

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

purge_training_data #

purge_training_data(training_dataset_version: int) -> None

Delete a training dataset (data only).

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# purge training data
feature_view.purge_training_data(training_dataset_version=1)

PARAMETER	DESCRIPTION
`training_dataset_version`	Version of the training dataset to be removed. TYPE: `int`

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

read_log #

read_log(
    start_time: str
    | int
    | datetime
    | datetime.date
    | None = None,
    end_time: str
    | int
    | datetime
    | datetime.date
    | None = None,
    filter: Filter | Logic | None = None,
    transformed: bool | None = False,
    training_dataset_version: int | None = None,
    model: Model = None,
    model_name: str | None = None,
    model_version: int | None = None,
) -> (
    TypeVar("pyspark.sql.DataFrame")
    | pd.DataFrame
    | pl.DataFrame
)

Read the log entries for the current feature view.

Optionally, filter can be applied to start/end time, training dataset version, hsml model, and custom filter.

PARAMETER	DESCRIPTION
`start_time`	Start time for the log entries. Can be a string, integer, datetime, or date. Defaults to None. TYPE: `str \| int \| datetime \| datetime.date \| None` DEFAULT: `None`
`end_time`	End time for the log entries. Can be a string, integer, datetime, or date. Defaults to None. TYPE: `str \| int \| datetime \| datetime.date \| None` DEFAULT: `None`
`filter`	Filter to apply on the log entries. Can be a Filter or Logic object. Defaults to None. TYPE: `Filter \| Logic \| None` DEFAULT: `None`
`transformed`	Whether to include transformed logs. Defaults to False. TYPE: `bool \| None` DEFAULT: `False`
`training_dataset_version`	Version of the training dataset. Defaults to None. TYPE: `int \| None` DEFAULT: `None`
`model`	HSML model associated with the log. Defaults to None. TYPE: `Model` DEFAULT: `None`
`model_name`	`Optional[str]`. Name of the model to filter the log entries. If `model` is provided, this parameter will be ignored. TYPE: `str \| None` DEFAULT: `None`
`model_version`	`Optional[int]`. Version of the model to filter the log entries. If `model` is provided, this parameter will be ignored. TYPE: `int \| None` DEFAULT: `None`

Example

# read all log entries
log_entries = feature_view.read_log()
# read log entries within time ranges
log_entries = feature_view.read_log(start_time="2022-01-01", end_time="2022-01-31")
# read log entries of a specific training dataset version
log_entries = feature_view.read_log(training_dataset_version=1)
# read log entries of a specific hopsworks model
log_entries = feature_view.read_log(model=Model(1, "dummy", version=1))
# read log entries by applying filter on features of feature group `fg` in the feature view
log_entries = feature_view.read_log(filter=fg.feature1 > 10)

RETURNS	DESCRIPTION
`TypeVar('pyspark.sql.DataFrame') \| pd.DataFrame \| pl.DataFrame`	`DataFrame`: The spark dataframe containing the feature data.
`TypeVar('pyspark.sql.DataFrame') \| pd.DataFrame \| pl.DataFrame`	`pyspark.DataFrame`. A Spark DataFrame.
`TypeVar('pyspark.sql.DataFrame') \| pd.DataFrame \| pl.DataFrame`	`pandas.DataFrame`. A Pandas DataFrame.
`TypeVar('pyspark.sql.DataFrame') \| pd.DataFrame \| pl.DataFrame`	`polars.DataFrame`. A Polars DataFrame.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to read the log entries.

recreate_training_dataset #

recreate_training_dataset(
    training_dataset_version: int,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    write_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    transformation_context: dict[str, Any] = None,
) -> job.Job

Recreate a training dataset.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# recreate a training dataset that has been deleted
feature_view.recreate_training_dataset(training_dataset_version=1)

Info

If a materialised training data has deleted. Use recreate_training_dataset() to recreate the training data.

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`training_dataset_version`	training dataset version TYPE: `int`
`statistics_config`	A configuration object, or a dictionary with keys "`enabled`" to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`write_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. When using the `python` engine, write_options can contain the following entries: * key `use_spark` and value `True` to materialize training dataset with Spark instead of Hopsworks Feature Query Service. * key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. * key `wait_for_job` and value `True` or `False` to configure whether or not to the save call should return only after the Hopsworks Job has finished. By default it waits. Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`job.Job`	`Job`: When using the `python` engine, it returns the Hopsworks Job that was launched to create the training dataset.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request

resume_logging #

resume_logging() -> None

Resume scheduled materialization job for the current feature view.

Example

# resume logging
feature_view.resume_logging()

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	in case the backend fails to pause feature logging.

to_dict #

to_dict() -> dict[str, Any]

Convert class into a dictionary.

RETURNS	DESCRIPTION
`dict[str, Any]`	`Dict`: Dictionary that contains all data required to json serialize the object.

train_test_split #

train_test_split(
    test_size: float | None = None,
    train_start: str | int | datetime | date | None = "",
    train_end: str | int | datetime | date | None = "",
    test_start: str | int | datetime | date | None = "",
    test_end: str | int | datetime | date | None = "",
    description: str | None = "",
    extra_filter: filter.Filter
    | filter.Logic
    | None = None,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    read_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    training_helper_columns: bool = False,
    dataframe_type: str | None = "default",
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes | None,
    TrainingDatasetDataFrameTypes | None,
]

Create the metadata for a training dataset and get the corresponding training data from the offline feature store.

This returns the training data in memory and does not materialise data in storage. The training data is split into train and test set at random or according to time ranges. The training data can be recreated by calling feature_view.get_train_test_split with the metadata created.

Create random train/test splits

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training data
X_train, X_test, y_train, y_test = feature_view.train_test_split(
    test_size=0.2
)

Create time-series train/test splits

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
train_start = "2022-05-01 00:00:00"
train_end = "2022-06-04 23:59:59"
test_start = "2022-07-01 00:00:00"
test_end= "2022-08-04 23:59:59"
# you can also pass dates as datetime objects

# get training data
X_train, X_test, y_train, y_test = feature_view.train_test_split(
    train_start=train_start,
    train_end=train_end,
    test_start=test_start,
    test_end=test_end,
    description='Description of a dataset'
)

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`test_size`	size of test set. Should be between 0 and 1. TYPE: `float \| None` DEFAULT: `None`
`train_start`	Start event time for the train split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`train_end`	End event time for the train split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_start`	Start event time for the test split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_end`	End event time for the test split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`description`	A string describing the contents of the training dataset to improve discoverability for Data Scientists, defaults to empty string `""`. TYPE: `str \| None` DEFAULT: `''`
`extra_filter`	Additional filters to be attached to the training dataset. The filters will be also applied in `get_batch_data`. TYPE: `filter.Filter \| filter.Logic \| None` DEFAULT: `None`
`statistics_config`	A configuration object, or a dictionary with keys "`enabled`" to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`read_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. When using the `python` engine, read_options can contain the following entries: * key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}` * key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`primary_key`	whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`training_helper_columns`	whether to include training helper columns or not. Training helper columns are a list of feature names in the feature view, defined during its creation, that are not the part of the model schema itself but can be used during training as a helper for extra information. If training helper columns were not defined in the feature view then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	str, optional. The type of the returned dataframe. Possible values are `"default"`, `"spark"`,`"pandas"`, `"polars"`, `"numpy"` or `"python"`. Defaults to "default", which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `str \| None` DEFAULT: `'default'`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`(X_train, X_test, y_train, y_test)`	Tuple of dataframe of features and labels

train_validation_test_split #

train_validation_test_split(
    validation_size: float | None = None,
    test_size: float | None = None,
    train_start: str | int | datetime | date | None = "",
    train_end: str | int | datetime | date | None = "",
    validation_start: str
    | int
    | datetime
    | date
    | None = "",
    validation_end: str | int | datetime | date | None = "",
    test_start: str | int | datetime | date | None = "",
    test_end: str | int | datetime | date | None = "",
    description: str | None = "",
    extra_filter: filter.Filter
    | filter.Logic
    | None = None,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    read_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    training_helper_columns: bool = False,
    dataframe_type: str | None = "default",
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes | None,
    TrainingDatasetDataFrameTypes | None,
    TrainingDatasetDataFrameTypes | None,
]

Create the metadata for a training dataset and get the corresponding training data from the offline feature store.

This returns the training data in memory and does not materialise data in storage. The training data is split into train, validation, and test set at random or according to time ranges. The training data can be recreated by calling feature_view.get_train_validation_test_split with the metadata created.

Example

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training data
X_train, X_val, X_test, y_train, y_val, y_test = feature_view.train_validation_test_split(
    validation_size=0.3,
    test_size=0.2
)

Time Series split

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up dates
start_time_train = '2017-01-01 00:00:01'
end_time_train = '2018-02-01 23:59:59'

start_time_val = '2018-02-02 23:59:59'
end_time_val = '2019-02-01 23:59:59'

start_time_test = '2019-02-02 23:59:59'
end_time_test = '2020-02-01 23:59:59'
# you can also pass dates as datetime objects

# get training data
X_train, X_val, X_test, y_train, y_val, y_test = feature_view.train_validation_test_split(
    train_start=start_time_train,
    train_end=end_time_train,
    validation_start=start_time_val,
    validation_end=end_time_val,
    test_start=start_time_test,
    test_end=end_time_test
)

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`validation_size`	size of validation set. Should be between 0 and 1. TYPE: `float \| None` DEFAULT: `None`
`test_size`	size of test set. Should be between 0 and 1. TYPE: `float \| None` DEFAULT: `None`
`train_start`	Start event time for the train split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`train_end`	End event time for the train split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`validation_start`	Start event time for the validation split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`validation_end`	End event time for the validation split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_start`	Start event time for the test split query, inclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`test_end`	End event time for the test split query, exclusive. Strings should be formatted in one of the following formats `%Y-%m-%d`, `%Y-%m-%d %H`, `%Y-%m-%d %H:%M`, `%Y-%m-%d %H:%M:%S`, or `%Y-%m-%d %H:%M:%S.%f`. Int, i.e Unix Epoch should be in seconds. TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `''`
`description`	A string describing the contents of the training dataset to improve discoverability for Data Scientists, defaults to empty string `""`. TYPE: `str \| None` DEFAULT: `''`
`extra_filter`	Additional filters to be attached to the training dataset. The filters will be also applied in `get_batch_data`. TYPE: `filter.Filter \| filter.Logic \| None` DEFAULT: `None`
`statistics_config`	A configuration object, or a dictionary with keys "`enabled`" to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`read_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. When using the `python` engine, read_options can contain the following entries: * key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}` * key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`primary_key`	whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`training_helper_columns`	whether to include training helper columns or not. Training helper columns are a list of feature names in the feature view, defined during its creation, that are not the part of the model schema itself but can be used during training as a helper for extra information. If training helper columns were not defined in the feature view then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	str, optional. The type of the returned dataframe. Possible values are `"default"`, `"spark"`,`"pandas"`, `"polars"`, `"numpy"` or `"python"`. Defaults to "default", which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `str \| None` DEFAULT: `'default'`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`(X_train, X_val, X_test, y_train, y_val, y_test)`	Tuple of dataframe of features and labels

training_data #

training_data(
    start_time: str | int | datetime | date | None = None,
    end_time: str | int | datetime | date | None = None,
    description: str | None = "",
    extra_filter: filter.Filter
    | filter.Logic
    | None = None,
    statistics_config: StatisticsConfig
    | bool
    | dict
    | None = None,
    read_options: dict[Any, Any] | None = None,
    spine: SplineDataFrameTypes | None = None,
    primary_key: bool = False,
    event_time: bool = False,
    training_helper_columns: bool = False,
    dataframe_type: str | None = "default",
    transformation_context: dict[str, Any] = None,
    **kwargs,
) -> tuple[
    TrainingDatasetDataFrameTypes,
    TrainingDatasetDataFrameTypes | None,
]

Create the metadata for a training dataset and get the corresponding training data from the offline feature store.

This returns the training data in memory and does not materialise data in storage. The training data can be recreated by calling feature_view.get_training_data with the metadata created.

Create random splits

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# get training data
features_df, labels_df  = feature_view.training_data(
    description='Descriprion of a dataset',
)

Create time-series based splits

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

# set up a date
start_time = "2022-05-01 00:00:00"
end_time = "2022-06-04 23:59:59"
# you can also pass dates as datetime objects

# get training data
features_df, labels_df = feature_view.training_data(
    start_time=start_time,
    end_time=end_time,
    description='Description of a dataset'
)

Spine Groups/Dataframes

Spine groups and dataframes are currently only supported with the Spark engine and Spark dataframes.

PARAMETER	DESCRIPTION
`start_time`	Start event time for the training dataset query, inclusive. Strings should TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `None`
`end_time`	End event time for the training dataset query, exclusive. Strings should be TYPE: `str \| int \| datetime \| date \| None` DEFAULT: `None`
`description`	A string describing the contents of the training dataset to improve discoverability for Data Scientists, defaults to empty string `""`. TYPE: `str \| None` DEFAULT: `''`
`extra_filter`	Additional filters to be attached to the training dataset. The filters will be also applied in `get_batch_data`. TYPE: `filter.Filter \| filter.Logic \| None` DEFAULT: `None`
`statistics_config`	A configuration object, or a dictionary with keys "`enabled`" to generally enable descriptive statistics computation for this feature group, `"correlations`" to turn on feature correlation computation and `"histograms"` to compute feature value frequencies. The values should be booleans indicating the setting. To fully turn off statistics computation pass `statistics_config=False`. Defaults to `None` and will compute only descriptive statistics. TYPE: `StatisticsConfig \| bool \| dict \| None` DEFAULT: `None`
`read_options`	Additional options as key/value pairs to pass to the execution engine. For spark engine: Dictionary of read options for Spark. When using the `python` engine, read_options can contain the following entries: * key `"arrow_flight_config"` to pass a dictionary of arrow flight configurations. For example: `{"arrow_flight_config": {"timeout": 900}}`. * key `spark` and value an object of type hsfs.core.job_configuration.JobConfiguration to configure the Hopsworks Job used to compute the training dataset. Defaults to `{}`. TYPE: `dict[Any, Any] \| None` DEFAULT: `None`
`spine`	Spine dataframe with primary key, event time and label column to use for point in time join when fetching features. Defaults to `None` and is only required when feature view was created with spine group in the feature query. It is possible to directly pass a spine group instead of a dataframe to overwrite the left side of the feature join, however, the same features as in the original feature group that is being replaced need to be available in the spine group. TYPE: `SplineDataFrameTypes \| None` DEFAULT: `None`
`primary_key`	whether to include primary key features or not. Defaults to `False`, no primary key features. TYPE: `bool` DEFAULT: `False`
`event_time`	whether to include event time feature or not. Defaults to `False`, no event time feature. TYPE: `bool` DEFAULT: `False`
`training_helper_columns`	whether to include training helper columns or not. Training helper columns are a list of feature names in the feature view, defined during its creation, that are not the part of the model schema itself but can be used during training as a helper for extra information. If training helper columns were not defined in the feature view then`training_helper_columns=True` will not have any effect. Defaults to `False`, no training helper columns. TYPE: `bool` DEFAULT: `False`
`dataframe_type`	str, optional. The type of the returned dataframe. Possible values are `"default"`, `"spark"`,`"pandas"`, `"polars"`, `"numpy"` or `"python"`. Defaults to "default", which maps to Spark dataframe for the Spark Engine and Pandas dataframe for the Python engine. TYPE: `str \| None` DEFAULT: `'default'`
`transformation_context`	`Dict[str, Any]` A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to `None`. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
`(X, y)`	Tuple of dataframe of features and labels. If there are no labels, y returns `None`.

transform #

transform(
    feature_vector: list[Any]
    | list[list[Any]]
    | pd.DataFrame
    | pl.DataFrame,
    external: bool | None = None,
    transformation_context: dict[str, Any] = None,
    return_type: Literal[
        "list", "numpy", "pandas", "polars"
    ] = None,
)

Transform the input feature vector by applying Model-dependent transformations attached to the feature view.

List input must match the schema of the feature view

If features are provided as a List to the transform function. Make sure that the input are ordered to match the schema in the feature view.

Parameters: feature_vector: Union[List[Any], List[List[Any]], pd.DataFrame, pl.DataFrame]. The feature vector to be transformed. external: boolean, optional. If set to True, the connection to the online feature store is established using the same host as for the host parameter in the hopsworks.login() method. If set to False, the online feature store storage connector is used which relies on the private IP. Defaults to True if connection to Hopsworks is established from external environment (e.g AWS Sagemaker or Google Colab), otherwise to False. transformation_context: Dict[str, Any] A dictionary mapping variable names to objects that will be provided as contextual information to the transformation function at runtime. These variables must be explicitly defined as parameters in the transformation function to be accessible during execution. If no context variables are provided, this parameter defaults to None. return_type: "list", "pandas", "polars" or "numpy". Defaults to the same type as the input feature vector.

RETURNS	DESCRIPTION
	`Union[List[Any], List[List[Any]], pd.DataFrame, pl.DataFrame]`: The transformed feature vector obtained by applying Model-Dependent Transformations.

update #

update() -> FeatureView

Update the description of the feature view.

Update the feature view with a new description

# get feature store instance
fs = ...

# get feature view instance
feature_view = fs.get_feature_view(...)

feature_view.description = "new description"
feature_view.update()

# Description is updated in the metadata. Below should return "new description".
fs.get_feature_view("feature_view_name", 1).description

RETURNS	DESCRIPTION
`FeatureView`	Updated feature view.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.RestAPIError`	If the backend encounters an error when handling the request.

update_from_response_json #

update_from_response_json(
    json_dict: dict[str, Any],
) -> FeatureView

Function that updates the class object from its json serialization.

PARAMETER	DESCRIPTION
`json_dict`	`Dict[str, Any]`. Json serialized dictionary for the class. TYPE: `dict[str, Any]`

RETURNS	DESCRIPTION
`FeatureView`	`TransformationFunction`: Json deserialized class object.

update_last_accessed_training_dataset #

update_last_accessed_training_dataset(version)

Update the cached last accessed training dataset version.

Feature View#

FeatureView #

description property writable #

feature_logging property #

feature_store_name property #

features property #

featurestore_id property writable #

id property writable #

inference_helper_columns property writable #

labels property writable #

logging_enabled property writable #

model_dependent_transformations property #

name property writable #

on_demand_transformations property #

primary_keys property #

query property writable #

request_parameters property #

schema property writable #

serving_keys property writable #

training_helper_columns property writable #

transformation_functions property writable #

version property writable #

add_tag #

add_training_dataset_tag #

clean staticmethod #

compute_on_demand_features #

create_alert #

create_feature_logger #

create_feature_monitoring #

create_statistics_monitoring #

create_train_test_split #

create_train_validation_test_split #

create_training_data #

delete #

delete_all_training_datasets #

delete_log #

delete_tag #

delete_training_dataset #

delete_training_dataset_tag #

enable_logging #

find_neighbors #

from_response_json classmethod #

get_alert #

get_alerts #

get_batch_data #

get_batch_query #

get_feature_monitoring_configs #

get_feature_monitoring_history #

get_feature_vector #

get_feature_vectors #

get_inference_helper #

get_inference_helpers #

get_last_accessed_training_dataset #

get_log_timeline #

get_models #

get_models_provenance #

get_newest_model #

get_parent_feature_groups #

get_tag #

get_tags #

get_train_test_split #

get_train_validation_test_split #

get_training_data #

get_training_dataset_schema #

get_training_dataset_statistics #

get_training_dataset_tag #

get_training_dataset_tags #

get_training_datasets #

init_batch_scoring #

init_feature_logger #

init_serving #

json #

log #

materialize_log #

pause_logging #

purge_all_training_data #

purge_training_data #

read_log #

recreate_training_dataset #

resume_logging #

description `property` `writable` #

feature_logging `property` #

feature_store_name `property` #

features `property` #

featurestore_id `property` `writable` #

id `property` `writable` #

inference_helper_columns `property` `writable` #

labels `property` `writable` #

logging_enabled `property` `writable` #

model_dependent_transformations `property` #

name `property` `writable` #

on_demand_transformations `property` #

primary_keys `property` #

query `property` `writable` #

request_parameters `property` #

schema `property` `writable` #

serving_keys `property` `writable` #

training_helper_columns `property` `writable` #

transformation_functions `property` `writable` #

version `property` `writable` #

clean `staticmethod` #

from_response_json `classmethod` #