Skip to main content
Version: 0.14.13

How to add input validation and type checking for a Custom Expectation

Prerequisites: This how-to guide assumes you have:

ExpectationsA verifiable assertion about data. will typically be configured using input parameters. These parameters are required to provide your Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. with the context it needs to ValidateThe act of applying an Expectation Suite to a Batch. your data. Ensuring that these requirements are fulfilled is the purpose of type checking and validating your input parameters.

For example, we might expect the fraction of null values to be mostly=.05, in which case any value above 1 would indicate an impossible fraction of a single whole (since a value above one indicates more than a single whole), and should throw an error. Another example would be if we want to indicate that the the mean of a row adheres to a minimum value bound, such as min_value=5. In this case, attempting to pass in a non numerical value should clearly throw an error!

This guide will walk you through the process of adding validation and Type Checking to the input parameters of the Custom Expectation built in the guide for how to create a Custom Column Aggregate Expectation. When you have completed this guide, you will have implemented a method to validate that the input parameters provided to this Custom Expectation satisfy the requirements necessary for them to be used as intended by the Custom Expectation's code.

Steps

1. Deciding what to validate

As a general rule, we want to validate any of our input parameters and success keys that are explicitly used by our Expectation class. In the case of our example Expectation expect_column_max_to_be_between_custom, we've defined four parameters to validate:

  • min_value: An integer or float defining the lowest acceptable bound for our column max
  • max_value: An integer or float defining the highest acceptable bound for our column max
  • strict_min: A boolean value defining whether our column max is (strict_min=False) or is not (strict_min=True) allowed to equal the min_value
  • strict_max: A boolean value defining whether our column max is (strict_max=False) or is not (strict_max=True) allowed to equal the max_value
What don't we need to validate?
You may have noticed we're not validating whether the column parameter has been set. Great Expectations implicitly handles the validation of certain parameters universal to each class of Expectation, so you don't have to!

2. Defining the validation method

We define the validate_configuration(...) method of our Custom Expectation class to ensure that the input parameters constitute a valid configuration, and doesn't contain illogical or incorrect values. For example, if min_value is greater than max_value, max_value=True, or strict_min=Joe, we want to throw an exception. To do this, we're going to write a series of assert statements to catch invalid values for our parameters.

To begin with, we want to create our validate_configuration(...) method and ensure that a configuration is set:

def validate_configuration(
self, configuration: Optional[ExpectationConfiguration]
) -> None:
"""
Validates that a configuration has been set, and sets a configuration if it has yet to be set. Ensures that
necessary configuration arguments have been provided for the validation of the expectation.
Args:
configuration (OPTIONAL[ExpectationConfiguration]): \
An optional Expectation Configuration entry that will be used to configure the expectation
Returns:
None. Raises InvalidExpectationConfigurationError if the config is not validated successfully
"""

# Setting up a configuration
super().validate_configuration(configuration)
if configuration is None:
configuration = self.configuration

Next, we're going to implement the logic for validating the four parameters we identified above.

3. Accessing parameters and writing assertions

First we need to access the parameters to be evaluated:

min_value = configuration.kwargs["min_value"]
max_value = configuration.kwargs["max_value"]
strict_min = configuration.kwargs["strict_min"]
strict_max = configuration.kwargs["strict_max"]

Now we can begin writing the assertions to validate these parameters.

We're going to ensure that at least one of min_value or max_value is set:

try:
assert (
min_value is not None or max_value is not None
), "min_value and max_value cannot both be none"

Check that min_value and max_value are of the correct type:

assert min_value is None or isinstance(
min_value, (float, int)
), "Provided min threshold must be a number"
assert max_value is None or isinstance(
max_value, (float, int)
), "Provided max threshold must be a number"

Verify that, if both min_value and max_value are set, min_value does not exceed max_value:

if min_value and max_value:
assert (
min_value <= max_value
), "Provided min threshold must be less than or equal to max threshold"

And assert that strict_min and strict_max, if provided, are of the correct type:

assert strict_min is None or isinstance(
strict_min, bool
), "strict_min must be a boolean value"
assert strict_max is None or isinstance(
strict_max, bool
), "strict_max must be a boolean value"

If any of these fail, we raise an exception:

except AssertionError as e:
raise InvalidExpectationConfigurationError(str(e))

Putting this all together, our validate_configuration(...) method looks like:

def validate_configuration(
self, configuration: Optional[ExpectationConfiguration]
) -> None:
"""
Validates that a configuration has been set, and sets a configuration if it has yet to be set. Ensures that
necessary configuration arguments have been provided for the validation of the expectation.
Args:
configuration (OPTIONAL[ExpectationConfiguration]): \
An optional Expectation Configuration entry that will be used to configure the expectation
Returns:
None. Raises InvalidExpectationConfigurationError if the config is not validated successfully
"""

# Setting up a configuration
super().validate_configuration(configuration)
if configuration is None:
configuration = self.configuration

min_value = configuration.kwargs["min_value"]
max_value = configuration.kwargs["max_value"]
strict_min = configuration.kwargs["strict_min"]
strict_max = configuration.kwargs["strict_max"]

# Validating that min_val, max_val, strict_min, and strict_max are of the proper format and type
try:
assert (
min_value is not None or max_value is not None
), "min_value and max_value cannot both be none"
assert min_value is None or isinstance(
min_value, (float, int)
), "Provided min threshold must be a number"
assert max_value is None or isinstance(
max_value, (float, int)
), "Provided max threshold must be a number"
if min_value and max_value:
assert (
min_value <= max_value
), "Provided min threshold must be less than or equal to max threshold"
assert strict_min is None or isinstance(
strict_min, bool
), "strict_min must be a boolean value"
assert strict_max is None or isinstance(
strict_max, bool
), "strict_max must be a boolean value"
except AssertionError as e:
raise InvalidExpectationConfigurationError(str(e))

4. Verifying our method

If you now run your file, print_diagnostic_checklist() will attempt to execute the validate_configuration(...) using the input provided in your Example Cases.

If your input is successfully validated, and the rest the logic in your Custom Expectation is already complete, you will see the following in your Diagnostic Checklist:

✔ Has basic input validation and type checking

Congratulations!
🎉 You've successfully added input validation & type checking to a Custom Expectation! 🎉

5. Contribution (Optional)

The method implemented in this guide is an optional feature for Experimental Expectations, and a requirement for contribution back to Great Expectations at Beta and Production levels.

If you would like to contribute your Custom Expectation to the Great Expectations codebase, please submit a Pull Request.

note

For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.

To view the full script used in this page, see it on GitHub: