Version: 0.16.16

How to create and edit Expectations based on domain knowledge, without inspecting data directly

This guide shows how to create an Expectation SuiteA collection of verifiable assertions about data. without a sample BatchA selection of records from a Data Asset..

The following are the reasons why you might want to do this:

You don't have a sample.
You don't currently have access to the data to make a sample.
You know exactly how you want your ExpectationsA verifiable assertion about data. to be configured.
You want to create Expectations parametrically (you can also do this in interactive mode).
You don't want to spend the time to validate against a sample.

If you have a use case we have not considered, please contact us on Slack.

Does this process edit my data?

No. The interactive method used to create and edit Expectations does not edit or alter the Batch data.

Prerequisites

Great Expectations installed in a Python environment
A Filesystem Data Context for your Expectations
Created a Datasource from which to request a Batch of data for introspection

If you haven't set up Great Expectations

See one of the following guides:

If you haven't initialized your Data Context

See one of the following guides:

Quickstart Data Context

How to quickly instantiate a Data Context

Filesystem Data Contexts

If you haven't created a Datasource

See one of the following guides:

Connecting GX to filesystem source data

Local Filesystems

Google Cloud Storage

Azure Blob Storage

Amazon Web Services

Connecting GX to in-memory source data

How to connect to in-memory data using Pandas

Connecting GX to SQL source data

General SQL Datasources

How to connect to SQL data

Specific SQL dialects

Steps

1. Import the Great Expectations module and instantiate a Data Context

For this guide we will be working with Python code in a Jupyter Notebook. Jupyter is included with GX and lets us easily edit code and immediately see the results of our changes.

Run the following code to import Great Expectations and instantiate a Data Context:

import great_expectations as gx

context = gx.data_context.FileDataContext.create(full_path_to_project_directory)

Data Contexts and persisting data

If you're using an Ephemeral Data Context, your configurations will not persist beyond the current Python session. However, if you're using a Filesystem or Cloud Data Context, they do persist. The get_context() method returns the first Cloud or Filesystem Data Context it can find. If a Cloud or Filesystem Data Context has not be configured or cannot be found, it provides an Ephemeral Data Context. For more information about the get_context() method, see How to quickly instantiate a Data Context.

2. Create an ExpectationSuite

We will use the add_expectation_suite() method to create an empty ExpectationSuite.

suite = context.add_expectation_suite(expectation_suite_name="my_suite")

3. Create Expectation Configurations

You are adding Expectation configurations to the suite. Since there is no sample Batch of data, no ValidationThe act of applying an Expectation Suite to a Batch. happens during this process. To illustrate how to do this, consider a hypothetical example. Suppose that you have a table with the columns account_id, user_id, transaction_id, transaction_type, and transaction_amt_usd. Then the following code snipped adds an Expectation that the columns of the actual table will appear in the order specified above:

from great_expectations.core.expectation_configuration import ExpectationConfiguration

# Create an Expectation
expectation_configuration_1 = ExpectationConfiguration(
    # Name of expectation type being added
    expectation_type="expect_table_columns_to_match_ordered_list",
    # These are the arguments of the expectation
    # The keys allowed in the dictionary are Parameters and
    # Keyword Arguments of this Expectation Type
    kwargs={
        "column_list": [
            "account_id",
            "user_id",
            "transaction_id",
            "transaction_type",
            "transaction_amt_usd",
        ]
    },
    # This is how you can optionally add a comment about this expectation.
    # It will be rendered in Data Docs.
    # See this guide for details:
    # `How to add comments to Expectations and display them in Data Docs`.
    meta={
        "notes": {
            "format": "markdown",
            "content": "Some clever comment about this expectation. **Markdown** `Supported`",
        }
    },
)
# Add the Expectation to the suite
suite.add_expectation(expectation_configuration=expectation_configuration_1)

Here are a few more example expectations for this dataset:

expectation_configuration_2 = ExpectationConfiguration(
    expectation_type="expect_column_values_to_be_in_set",
    kwargs={
        "column": "transaction_type",
        "value_set": ["purchase", "refund", "upgrade"],
    },
    # Note optional comments omitted
)
suite.add_expectation(expectation_configuration=expectation_configuration_2)

expectation_configuration_3 = ExpectationConfiguration(
    expectation_type="expect_column_values_to_not_be_null",
    kwargs={
        "column": "account_id",
        "mostly": 1.0,
    },
    meta={
        "notes": {
            "format": "markdown",
            "content": "Some clever comment about this expectation. **Markdown** `Supported`",
        }
    },
)
suite.add_expectation(expectation_configuration=expectation_configuration_3)

expectation_configuration_4 = ExpectationConfiguration(
    expectation_type="expect_column_values_to_not_be_null",
    kwargs={
        "column": "user_id",
        "mostly": 0.75,
    },
    meta={
        "notes": {
            "format": "markdown",
            "content": "Some clever comment about this expectation. **Markdown** `Supported`",
        }
    },
)
suite.add_expectation(expectation_configuration=expectation_configuration_4)

You can see all the available Expectations in the Expectation Gallery.

4. Save your Expectations for future use

To keep your Expectations for future use, you save them to your Data Context. A Filesystem or Cloud Data Context persists outside the current Python session, so saving the Expectation Suite in your Data Context's Expectations Store ensures you can access it in the future:

context.save_expectation_suite(expectation_suite=suite)

Ephemeral Data Contexts and persistence

Ephemeral Data Contexts don't persist beyond the current Python session. If you're working with an Ephemeral Data Context, you'll need to convert it to a Filesystem Data Context using the Data Context's convert_to_file_context() method. Otherwise, your saved configurations won't be available in future Python sessions as the Data Context itself is no longer available.

Next steps

Now that you have created and saved an Expectation Suite, you can Validate your data.

Prerequisites​

If you haven't set up Great Expectations​

If you haven't initialized your Data Context​

If you haven't created a Datasource​

Connecting GX to filesystem source data​

Connecting GX to in-memory source data​

Connecting GX to SQL source data​

Steps​

1. Import the Great Expectations module and instantiate a Data Context​

2. Create an ExpectationSuite​

3. Create Expectation Configurations​

4. Save your Expectations for future use​

Next steps​

Prerequisites

If you haven't set up Great Expectations

If you haven't initialized your Data Context

If you haven't created a Datasource

Connecting GX to filesystem source data

Connecting GX to in-memory source data

Connecting GX to SQL source data

Steps

1. Import the Great Expectations module and instantiate a Data Context

2. Create an ExpectationSuite

3. Create Expectation Configurations

4. Save your Expectations for future use

Next steps