HopsworksUDF#

HopsworksUdf #

Meta data for user defined functions.

Stores meta data required to execute the user defined function in both spark and python engine. The class generates uses the metadata to dynamically generate user defined functions based on the engine it is executed in.

PARAMETER	DESCRIPTION
`func`	The transformation function object or the source code of the transformation function. TYPE: `Callable \| str`
`return_types`	A python type or a list of python types that denotes the data types of the columns output from the transformation functions. TYPE: `list[type] \| type \| list[str] \| str`
`name`	Name of the transformation function. TYPE: `str \| None` DEFAULT: `None`
`transformation_features`	A list of objects of `TransformationFeature` that maps the feature used for transformation to their corresponding statistics argument names if any. TYPE: `list[TransformationFeature] \| None` DEFAULT: `None`
`transformation_function_argument_names`	The argument names of the transformation function. TYPE: `list[str] \| None` DEFAULT: `None`
`dropped_argument_names`	The arguments to be dropped from the finial DataFrame after the transformation functions are applied. TYPE: `list[str] \| None` DEFAULT: `None`
`dropped_feature_names`	The feature name corresponding to the arguments names that are dropped. TYPE: `list[str] \| None` DEFAULT: `None`
`feature_name_prefix`	Prefixes if any used in the feature view. TYPE: `str \| None` DEFAULT: `None`
`output_column_names`	The names of the output columns returned from the transformation function. TYPE: `str \| None` DEFAULT: `None`
`generate_output_col_names`	Generate default output column names for the transformation function. TYPE: `bool` DEFAULT: `True`

dropped_features `property` `writable` #

dropped_features: list[str]

List of features that will be dropped after the UDF is applied.

feature_name_prefix `property` #

feature_name_prefix: str | None

The feature name prefix that needs to be added to the feature names.

function_name `property` #

function_name: str

Get the function name of the UDF.

output_column_names `property` `writable` #

output_column_names: list[str]

Output columns names of the transformation function.

return_types `property` #

return_types: list[str]

Get the output types of the UDF.

statistics_features `property` #

statistics_features: list[str]

List of feature names that require statistics.

statistics_required `property` #

statistics_required: bool

Get if statistics for any feature is required by the UDF.

transformation_context `property` `writable` #

transformation_context: dict[str, Any]

Dictionary that contains the context variables required for the UDF.

These context variables passed to the UDF during execution.

transformation_features `property` #

transformation_features: list[str]

List of feature names to be used in the User Defined Function.

transformation_statistics `property` `writable` #

transformation_statistics: TransformationStatistics | None

Feature statistics required for the defined UDF.

unprefixed_transformation_features `property` #

unprefixed_transformation_features: list[str]

List of feature name used in the transformation function without the feature name prefix.

call #

__call__(*features: list[str]) -> HopsworksUdf

Set features to be passed as arguments to the user defined functions.

PARAMETER	DESCRIPTION
`features`	Name of features to be passed to the User Defined function. TYPE: `list[str]` DEFAULT: `()`

RETURNS	DESCRIPTION
`HopsworksUdf`	Meta data class for the user defined function.

RAISES	DESCRIPTION
`hopsworks.client.exceptions.FeatureStoreException`	If the provided number of features do not match the number of arguments in the defined UDF or if the provided feature names are not strings.

alias #

alias(*args: str)

Set the names of the transformed features output by the UDF.

from_response_json `classmethod` #

from_response_json(
    json_dict: dict[str, Any],
) -> HopsworksUdf

Function that constructs the class object from its json serialization.

PARAMETER	DESCRIPTION
`json_dict`	JSON serialized dictionary for the class. TYPE: `dict[str, Any]`

RETURNS	DESCRIPTION
`HopsworksUdf`	JSON deserialized class object.

get_udf #

get_udf(online: bool = False) -> Callable

Function that checks the current engine type, execution type and returns the appropriate UDF.

If the execution mode is "default":

In the spark engine: During inference a spark udf is returned otherwise a spark pandas_udf is returned.
In the python engine: During inference a python udf is returned otherwise a pandas udf is returned.

If the execution mode is "pandas":

In the spark engine: Always returns a spark pandas udf.
In the python engine: Always returns a pandas udf.

If the execution mode is "python":

In the spark engine: Always returns a spark udf.
In the python engine: Always returns a python udf.

PARAMETER	DESCRIPTION
`inference`	Specify if udf required for online inference.

RETURNS	DESCRIPTION
`Callable`	Pandas UDF in the spark engine otherwise returns a python function for the UDF.

json #

json() -> str

Convert class into its json serialized form.

RETURNS	DESCRIPTION
`str`	JSON serialized object.

pandas_udf_wrapper #

pandas_udf_wrapper() -> Callable

Function that creates a dynamic wrapper function for the defined udf that renames the columns output by the UDF into specified column names.

The renames is done so that the column names match the schema expected by spark when multiple columns are returned in a pandas udf. The wrapper function would be available in the main scope of the program.

RETURNS	DESCRIPTION
`Callable`	A wrapper function that renames outputs of the User defined function into specified output column names.

python_udf_wrapper #

python_udf_wrapper(rename_outputs) -> Callable

Function that creates a dynamic wrapper function for the defined udf.

The wrapper function would be used to specify column names, in spark engines and to localize timezones.

The renames is done so that the column names match the schema expected by spark when multiple columns are returned in a spark udf. The wrapper function would be available in the main scope of the program.

RETURNS	DESCRIPTION
`Callable`	A wrapper function that renames outputs of the User defined function into specified output column names.

to_dict #

to_dict() -> dict[str, Any]

Convert class into a dictionary.

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary that contains all data required to json serialize the object.

TransformationFeature `dataclass` #

Mapping of feature names to their corresponding statistics argument names in the code.

The statistic_argument_name for a feature name would be None if the feature does not need statistics.

PARAMETER	DESCRIPTION
`feature_name`	Name of the feature. TYPE: `str`
`statistic_argument_name`	Name of the statistics argument in the code for the feature specified in the feature name. TYPE: `str \| None`

HopsworksUDF#

HopsworksUdf #

dropped_features property writable #

feature_name_prefix property #

function_name property #

output_column_names property writable #

return_types property #

statistics_features property #

statistics_required property #

transformation_context property writable #

transformation_features property #

transformation_statistics property writable #

unprefixed_transformation_features property #

__call__ #