How to add validations data or suites to a Checkpoint
This guide will help you add validation data or Expectation SuitesA collection of verifiable assertions about data. to an existing CheckpointThe primary means for validating data in a production deployment of Great Expectations.. This is useful if you want to aggregate individual validations (across Expectation Suites or DatasourcesProvides a standard API for accessing and interacting with data from a wide variety of source systems.) into a single Checkpoint.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data Context
- Configured an Expectations Suite
- Configured a Checkpoint
Steps
1. Open your existing Checkpoint in a text editor
It will look similar to this:
name: my_checkpoint
config_version: 1
class_name: Checkpoint
run_name_template: "%Y-%m-foo-bar-template-$VAR"
validations:
- batch_request:
datasource_name: my_datasource
data_connector_name: my_data_connector
data_asset_name: users
data_connector_query:
index: -1
expectation_suite_name: users.warning
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
evaluation_parameters:
param1: "$MY_PARAM"
param2: 1 + "$OLD_PARAM"
runtime_configuration:
result_format:
result_format: BASIC
partial_unexpected_count: 20
2. Edit the existing Checkpoint configuration to add an Expectation Suite
To add a second Expectation Suite (in this example we add users.error
) to your Checkpoint configuration, modify the file to add an additional batch_request
key and corresponding information, including evaluation_parameters
, action_list
, runtime_configuration
, and expectation_suite_name
. In fact, the simplest way to run a different Expectation Suite on the same BatchA selection of records from a Data Asset. of data is to make a copy of the original batch_request
entry and then edit the expectation_suite_name
value to correspond to a different Expectation Suite. The resulting configuration will look like this:
name: my_checkpoint
config_version: 1
class_name: Checkpoint
run_name_template: "%Y-%m-foo-bar-template-$VAR"
validations:
- batch_request:
datasource_name: my_datasource
data_connector_name: my_data_connector
data_asset_name: users
data_connector_query:
index: -1
expectation_suite_name: users.warning
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
evaluation_parameters:
param1: "$MY_PARAM"
param2: 1 + "$OLD_PARAM"
runtime_configuration:
result_format:
result_format: BASIC
partial_unexpected_count: 20
- batch_request:
datasource_name: my_datasource
data_connector_name: my_data_connector
data_asset_name: users
data_connector_query:
index: -1
expectation_suite_name: users.error
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
evaluation_parameters:
param1: "$MY_PARAM"
param2: 1 + "$OLD_PARAM"
runtime_configuration:
result_format:
result_format: BASIC
partial_unexpected_count: 20
3. Edit the existing Checkpoint configuration to add new validation data
In the above example, the entry we added with our Expectation Suite was paired with the same Batch of data as the original Expectation Suite. However, you may also specify different Batch RequestsProvided to a Datasource in order to create a Batch. (and thus different Batches of data) when you add an Expectation Suite. The flexibility of easily adding multiple Validations of Batches of data with different Expectation Suites and specific ActionsA Python class with a run method that takes a Validation Result and does something with it can be demonstrated using the following example of a Checkpoint configuration file:
name: my_fancy_checkpoint
config_version: 1
class_name: Checkpoint
run_name_template: "%Y-%m-foo-bar-template-$VAR"
expectation_suite_name: users.delivery
action_list:
- name: store_validation_result
action:
class_name: StoreValidationResultAction
- name: store_evaluation_params
action:
class_name: StoreEvaluationParametersAction
- name: update_data_docs
action:
class_name: UpdateDataDocsAction
validations:
- batch_request:
datasource_name: my_datasource
data_connector_name: my_data_connector
data_asset_name: users
data_connector_query:
index: 0
expectation_suite_name: users.warning
- batch_request:
datasource_name: my_datasource
data_connector_name: my_special_data_connector
data_asset_name: users
data_connector_query:
index: -1
expectation_suite_name: users.error
- batch_request:
datasource_name: my_datasource
data_connector_name: my_other_data_connector
data_asset_name: users
data_connector_query:
batch_filter_parameters:
name: Titanic
action_list:
- name: quarantine_failed_data
action:
class_name: CreateQuarantineData
- name: advance_passed_data
action:
class_name: CreateQuarantineData
evaluation_parameters:
param1: "$MY_PARAM"
param2: 1 + "$OLD_PARAM"
runtime_configuration:
result_format:
result_format: BASIC
partial_unexpected_count: 20
According to this configuration, the locally-specified Expectation Suite users.warning
is run against the batch_request
that employs my_data_connector
with the results processed by the Actions specified in the top-level action_list
. Similarly, the locally-specified Expectation Suite users.error
is run against the batch_request
that employs my_special_data_connector
with the results also processed by the actions specified in the top-level action_list
. In addition, the top-level Expectation Suite users.delivery
is run against the batch_request
that employs my_other_data_connector
with the results processed by the union of actions in the locally-specified action_list
and in the top-level action_list
.
Please see How to configure a new Checkpoint using test_yaml_config for additional Checkpoint configuration examples (including the convenient templating mechanism).
Additional notes
This is a good way to aggregate Validations in a complex pipeline. You could use this feature to ValidateThe act of applying an Expectation Suite to a Batch. multiple source files before and after their ingestion into your data lake.