Skip to main content
Version: 0.16.16

How to create and edit Expectations based on domain knowledge, without inspecting data directly

This guide shows how to create an Expectation SuiteA collection of verifiable assertions about data. without a sample BatchA selection of records from a Data Asset..

The following are the reasons why you might want to do this:

  • You don't have a sample.
  • You don't currently have access to the data to make a sample.
  • You know exactly how you want your ExpectationsA verifiable assertion about data. to be configured.
  • You want to create Expectations parametrically (you can also do this in interactive mode).
  • You don't want to spend the time to validate against a sample.

If you have a use case we have not considered, please contact us on Slack.

Does this process edit my data?

No. The interactive method used to create and edit Expectations does not edit or alter the Batch data.

Prerequisites

  • Great Expectations installed in a Python environment
  • A Filesystem Data Context for your Expectations
  • Created a Datasource from which to request a Batch of data for introspection

If you haven't set up Great Expectations

If you haven't initialized your Data Context

If you haven't created a Datasource

Steps

1. Import the Great Expectations module and instantiate a Data Context

For this guide we will be working with Python code in a Jupyter Notebook. Jupyter is included with GX and lets us easily edit code and immediately see the results of our changes.

Run the following code to import Great Expectations and instantiate a Data Context:

import great_expectations as gx

context = gx.data_context.FileDataContext.create(full_path_to_project_directory)
Data Contexts and persisting data

If you're using an Ephemeral Data Context, your configurations will not persist beyond the current Python session. However, if you're using a Filesystem or Cloud Data Context, they do persist. The get_context() method returns the first Cloud or Filesystem Data Context it can find. If a Cloud or Filesystem Data Context has not be configured or cannot be found, it provides an Ephemeral Data Context. For more information about the get_context() method, see How to quickly instantiate a Data Context.

2. Create an ExpectationSuite

We will use the add_expectation_suite() method to create an empty ExpectationSuite.

suite = context.add_expectation_suite(expectation_suite_name="my_suite")

3. Create Expectation Configurations

You are adding Expectation configurations to the suite. Since there is no sample Batch of data, no ValidationThe act of applying an Expectation Suite to a Batch. happens during this process. To illustrate how to do this, consider a hypothetical example. Suppose that you have a table with the columns account_id, user_id, transaction_id, transaction_type, and transaction_amt_usd. Then the following code snipped adds an Expectation that the columns of the actual table will appear in the order specified above:

from great_expectations.core.expectation_configuration import ExpectationConfiguration

# Create an Expectation
expectation_configuration_1 = ExpectationConfiguration(
# Name of expectation type being added
expectation_type="expect_table_columns_to_match_ordered_list",
# These are the arguments of the expectation
# The keys allowed in the dictionary are Parameters and
# Keyword Arguments of this Expectation Type
kwargs={
"column_list": [
"account_id",
"user_id",
"transaction_id",
"transaction_type",
"transaction_amt_usd",
]
},
# This is how you can optionally add a comment about this expectation.
# It will be rendered in Data Docs.
# See this guide for details:
# `How to add comments to Expectations and display them in Data Docs`.
meta={
"notes": {
"format": "markdown",
"content": "Some clever comment about this expectation. **Markdown** `Supported`",
}
},
)
# Add the Expectation to the suite
suite.add_expectation(expectation_configuration=expectation_configuration_1)

Here are a few more example expectations for this dataset:

expectation_configuration_2 = ExpectationConfiguration(
expectation_type="expect_column_values_to_be_in_set",
kwargs={
"column": "transaction_type",
"value_set": ["purchase", "refund", "upgrade"],
},
# Note optional comments omitted
)
suite.add_expectation(expectation_configuration=expectation_configuration_2)
expectation_configuration_3 = ExpectationConfiguration(
expectation_type="expect_column_values_to_not_be_null",
kwargs={
"column": "account_id",
"mostly": 1.0,
},
meta={
"notes": {
"format": "markdown",
"content": "Some clever comment about this expectation. **Markdown** `Supported`",
}
},
)
suite.add_expectation(expectation_configuration=expectation_configuration_3)
expectation_configuration_4 = ExpectationConfiguration(
expectation_type="expect_column_values_to_not_be_null",
kwargs={
"column": "user_id",
"mostly": 0.75,
},
meta={
"notes": {
"format": "markdown",
"content": "Some clever comment about this expectation. **Markdown** `Supported`",
}
},
)
suite.add_expectation(expectation_configuration=expectation_configuration_4)

You can see all the available Expectations in the Expectation Gallery.

4. Save your Expectations for future use

To keep your Expectations for future use, you save them to your Data Context. A Filesystem or Cloud Data Context persists outside the current Python session, so saving the Expectation Suite in your Data Context's Expectations Store ensures you can access it in the future:

context.save_expectation_suite(expectation_suite=suite)
Ephemeral Data Contexts and persistence

Ephemeral Data Contexts don't persist beyond the current Python session. If you're working with an Ephemeral Data Context, you'll need to convert it to a Filesystem Data Context using the Data Context's convert_to_file_context() method. Otherwise, your saved configurations won't be available in future Python sessions as the Data Context itself is no longer available.

Next steps

Now that you have created and saved an Expectation Suite, you can Validate your data.