Skip to main content
Version: 0.16.16

Conditional Expectations

note

Conditional Expectations are experimental, and they are available for Pandas, Spark, and SQLAlchemy backends.

You can create an Expectation for an entire dataset, or for a subset of the dataset. Some variables are dependent on the values of other variables. For example, a column that specifies that the country of origin must not be null for people of foreign descent.

Great Expectations lets you express Conditional Expectations with a row_condition argument that can be passed to all Dataset Expectations. The row_condition argument should be a boolean expression string. In addition, you must provide the condition_parser argument which defines the syntax of conditions. When implementing conditional Expectations with Pandas, this argument must be set to "pandas". When implementing conditional Expectations with Spark or SQLAlchemy, this argument must be set to "great_expectations__experimental__".

note

In Pandas the row_condition value is passed to pandas.DataFrame.query() before Expectation Validation. See pandas.DataFrame.query.

In Spark and SQLAlchemy, the row_condition value is parsed as a data filter or a query before Expectation Validation.

Examples

To test if different encodings of identical pieces of information are consistent with each other, run a command similar to this example:

validator.expect_column_values_to_be_in_set(
column='Sex',
value_set=['male'],
condition_parser='pandas',
row_condition='SexCode==0'
)

This returns:

{
"success": true,
"result": {
"element_count": 851,
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_count": 0,
"unexpected_percent": 0.0,
"unexpected_percent_nonmissing": 0.0,
"partial_unexpected_list": []
}
}
note

It is possible to add multiple Expectations of the same type to the Expectation Suite for a single column. One Expectation can be unconditional while an arbitrary number of Expectations (each with a different condition) can be conditional. For example:

validator.expect_column_values_to_be_in_set(
column='Survived',
value_set=[0, 1]
)
validator.expect_column_values_to_be_in_set(
column='Survived',
value_set=[1],
condition_parser='pandas',
row_condition='PClass=="1st"'
)
# The second Expectation fails, but we want to include it in the output:
validator.get_expectation_suite(
discard_failed_expectations=False
)

This results in the following Expectation Suite:

{
"expectation_suite_name": "default",
"expectations": [
{
"meta": {},
"kwargs": {
"column": "Survived",
"value_set": [0, 1]
},
"expectation_type": "expect_column_values_to_be_in_set"
},
{
"meta": {},
"kwargs": {
"column": "Survived",
"value_set": [1],
"row_condition": "PClass==\"1st\"",
"condition_parser": "pandas"
},
"expectation_type": "expect_column_values_to_be_in_set"
}
],
"data_asset_type": "Dataset"
}

Format of row_conditions values

Do not use single quotes or \n inside the specified row_condition as shown in the following examples:

row_condition="PClass=='1st'"  # never use simple quotes inside !!!
row_condition="""
PClass=="1st"
""" # never use \n inside !!!

Data Docs and Conditional Expectations

Conditional Expectations are displayed differently from standard Expectations in the Data Docs. Each Conditional Expectation is qualified with if 'row_condition_string', then values must be... as shown in the following image:

Image

If 'row_condition_string' is a complex expression, it is split into several components to improve readability.

Scope and limitations

While conditions can be attached to most Expectations, the following Expectations cannot be conditioned and do not take the row_condition argument:

  • expect_column_to_exist
  • expect_table_columns_to_match_ordered_list
  • expect_table_column_count_to_be_between
  • expect_table_column_count_to_equal

For more information, see the Data Docs.