Skip to main content
Version: 0.16.16

How to configure an Expectation Store to use GCS

By default, newl ProfiledThe act of generating Metrics and candidate Expectations from data. ExpectationsA verifiable assertion about data. are stored as Expectation SuitesA collection of verifiable assertions about data. in JSON format in the expectations/ subdirectory of your great_expectations/ folder. Use the information provided here to configure a new storage location for Expectations in Google Cloud Storage (GCS).

To view all the code used in this topic, see how_to_configure_an_expectation_store_in_gcs.py.

Prerequisites

1. Configure your GCP credentials

Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored. This includes the following:

  • A GCP service account.
  • Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable.
  • Verifying authentication by running a Google Cloud Storage client library script.

For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.

2. Identify your Data Context Expectations Store

The configuration for your Expectations StoreA connector to store and retrieve information about metadata in Great Expectations. is available in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.yml and find the following entry:

stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/

expectations_store_name: expectations_store

This configuration tells Great Expectations to look for Expectations in the expectations_store Store. The default base_directory for expectations_store is expectations/.

3. Update your configuration file to include a new store for Expectations

In the following example, expectations_store_name is set to expectations_GCS_store, but it can be personalized. You also need to change the store_backend settings. The class_name is TupleGCSStoreBackend, project is your GCP project, bucket is the address of your GCS bucket, and prefix is the folder on GCS where Expectations are stored.

stores:
expectations_GCS_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>

expectations_store_name: expectations_GCS_store
danger

If you are also storing Validations in GCS or DataDocs in GCS, make sure that the prefix values are disjoint and one is not a substring of the other.

4. Copy existing Expectation JSON files to the GCS bucket (Optional)

Use the gsutil cp command to copy Expectations into GCS. For example, the following command copies the Expectation `my_expectation_suite from a local folder into a GCS bucket:

gsutil cp expectations/my_expectation_suite.json gs://<your>/<your>/my_expectation_suite.json

The following confirmation message is returned:

Operation completed over 1 objects

Additional methods for copying Expectations into GCS are available. See Upload objects from a filesystem.

5. Confirm that the new Expectation Suites have been added

If you copied your existing Expectation Suites to GCS, run the following Python command to confirm that Great Expectations can find them:

import great_expectations as gx

context = gx.get_context()
context.list_expectation_suite_names()

A list of Expectation Suites you copied to GCS is returned. Expectation Suites that weren't copied to the new Store aren't listed.

6. Confirm that Expectations can be accessed from GCS

Run the following command to confirm your Expectations were copied to GCS:

great_expectations suite list

If your Expectations were not copied to Azure Blob Storage, a message indicating no Expectations were found is returned.