Skip to main content
Version: 0.14.13

How to configure DataContext components using test_yaml_config

test_yaml_config is a convenience method for configuring the moving parts of a Great Expectations deployment. It allows you to quickly test out configs for DatasourcesProvides a standard API for accessing and interacting with data from a wide variety of source systems. and StoresA connector to store and retrieve information about metadata in Great Expectations.. For many deployments of Great Expectations, these components (plus ExpectationsA verifiable assertion about data.) are the only ones you'll need.

Prerequisites: This how-to guide assumes you have:

test_yaml_config is primarily intended for use within a notebook, where you can iterate through an edit-run-check loop in seconds.

Steps

  1. Instantiate a DataContext

    Create a new Jupyter Notebook and instantiate a DataContext by running the following lines:

    import great_expectations as ge
    context = ge.get_context()
  2. Create or copy a yaml config

    You can create your own, or copy an example. For this example, we'll demonstrate using a Datasource that connects to PostgreSQL.

    config = """
    class_name: SimpleSqlalchemyDatasource
    credentials:
    drivername: postgresql
    username: postgres
    password: ""
    host: localhost
    port: 5432
    database: test_ci

    introspection:
    whole_table: {}
    """
  3. Run context.test_yaml_config.

    context.test_yaml_config(
    name="my_postgresql_datasource",
    yaml_config=config
    )

    When executed, test_yaml_config will instantiate the component and run through a self_check procedure to verify that the component works as expected.

    In the case of a Datasource, this means

    1. confirming that the connection works,
    2. gathering a list of available DataAssets (e.g. tables in SQL; files or folders in a filesystem), and
    3. verifying that it can successfully fetch at least one BatchA selection of records from a Data Asset. from the source.

    The output will look something like this:

    Attempting to instantiate class from config...
    Instantiating as a Datasource, since class_name is SimpleSqlalchemyDatasource
    Successfully instantiated SimpleSqlalchemyDatasource

    Execution engine: SqlAlchemyExecutionEngine
    Data connectors:
    whole_table : InferredAssetSqlDataConnector

    Available data_asset_names (3 of 14440):
    expect_table_row_count_to_equal_other_table_data_1 (1 of 1): [{}]
    expect_table_row_count_to_equal_other_table_data_2 (1 of 1): [{}]
    expect_table_row_count_to_equal_other_table_data_3 (1 of 1): [{}]

    Unmatched data_references (0 of 0): []

    Choosing an example data reference...
    Reference chosen: {}

    Fetching batch data...

    Showing 5 rows
    c1 c2 c3 c4
    0 4 a None 4.0
    1 5 b None 3.0
    2 6 c None 3.5
    3 7 d None 1.2

    <great_expectations.datasource.simple_sqlalchemy_datasource.SimpleSqlalchemyDatasource at 0x12c1e4d50>

    If something about your configuration wasn't set up correctly, test_yaml_config will raise an error. Whenever possible, test_yaml_config provides helpful warnings and error messages. It can't solve every problem, but it can solve many.

    Attempting to instantiate class from config...
    Instantiating as a Datasource, since class_name is SimpleSqlalchemyDatasource
    ---------------------------------------------------------------------------
    OperationalError Traceback (most recent call last)
    ~/anaconda2/envs/py3/lib/python3.7/site-packages/sqlalchemy/engine/base.py in _wrap_pool_connect(self, fn, connection)
    2338 try:
    -> 2339 return fn()
    2340 except dialect.dbapi.Error as e:

    ...

    OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
    Is the server running on host "localhost" (::1) and accepting
    TCP/IP connections on port 5433?
    could not connect to server: Connection refused
    Is the server running on host "localhost" (127.0.0.1) and accepting
    TCP/IP connections on port 5433?

    (Background on this error at: http://sqlalche.me/e/13/e3q8)
  4. Iterate as necessary.

    From here, iterate by editing your config and re-running test_yaml_config, adding config blocks for additional introspection, Data AssetsA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification., sampling, etc.

  5. (Optional:) Test additional methods.

    Note that when test_yaml_config runs successfully, it adds the specified Datasource to your DataContext. This means that you can also test other methods, such as context.get_validator, right within your notebook:

    validator = context.get_validator(
    datasource_name="my_datasource",
    data_connector_name="whole_table",
    data_asset_name="my_table",
    create_expectation_suite_with_name="my_expectation_suite",
    )
    validator.expect_column_values_to_be_in_set("c1", [4,5,6])
  6. Save the config.

    Once you are satisfied with your config, you can make it a permanent part of your Great Expectations setup by copying it into the appropriate section of your great_expectations/great_expectations.yml file.

    datasources:
    my_datasource:
    class_name: SimpleSqlalchemyDatasource
    credentials:
    drivername: postgresql
    username: postgres
    password: ""
    host: localhost
    port: 5432
    database: test_ci

    introspection:
    whole_table: {}
  7. Check your modified config.

    In a fresh notebook, test your edited config file by re-instantiating your DataContext:

    context = ge.get_context()

    validator = context.get_validator(
    datasource_name="my_datasource",
    data_connector_name="whole_table",
    data_asset_name="my_table",
    create_expectation_suite_with_name="my_expectation_suite",
    )
    validator.expect_column_values_to_be_in_set("c1", [4,5,6])