Skip to content

Feature Descriptive Statistics#

FeatureDescriptiveStatistics #

approx_num_distinct_values property #

approx_num_distinct_values: int | None

Approximate number of distinct values.

completeness property #

completeness: float | None

Fraction of non-null values in a column.

count property #

count: int

Number of values.

distinctness property #

distinctness: float | None

Fraction of distinct values of a feature over the number of all its values. Distinct values occur at least once.

Example

[a, a, b] contains two distinct values a and b, so distinctness is 2/3.

entropy property #

entropy: float | None

Entropy is a measure of the level of information contained in an event (feature value) when considering all possible events (all feature values).

Entropy is estimated using observed value counts as the negative sum of (value_count/total_count) * log(value_count/total_count).

Example

[a, b, b, c, c] has three distinct values with counts [1, 2, 2].

Entropy is then (-1/5*log(1/5)-2/5*log(2/5)-2/5*log(2/5)) = 1.055.

exact_num_distinct_values property #

exact_num_distinct_values: int | None

Exact number of distinct values.

extended_statistics property #

extended_statistics: dict | None

Additional statistics computed on the feature values such as histograms and correlations.

feature_name property #

feature_name: str

Name of the feature.

feature_type property #

feature_type: str

Data type of the feature. It can be one of Boolean, Fractional, Integral, or String.

id property #

id: int | None

ID of the feature descriptive statistics object.

max property #

max: float | None

Maximum value.

mean property #

mean: float | None

Mean value.

min property #

min: float | None

Minimum value.

num_non_null_values property #

num_non_null_values: int | None

Number of non-null values.

num_null_values property #

num_null_values: int | None

Number of null values.

percentiles property #

percentiles: Mapping[str, float] | None

Percentiles.

stddev property #

stddev: float | None

Standard deviation of the feature values.

sum property #

sum: float | None

Sum of all feature values.

uniqueness property #

uniqueness: float | None

Fraction of unique values over the number of all values of a column. Unique values occur exactly once.

Example

[a, a, b] contains one unique value b, so uniqueness is 1/3.