How to configure an Expectation Store to use GCS
By default, newly ProfiledThe act of generating Metrics and candidate Expectations from data. ExpectationsA verifiable assertion about data. are stored as Expectation SuitesA collection of verifiable assertions about data. in JSON format in the expectations/
subdirectory of your great_expectations/
folder. This guide will help you configure Great Expectations to store them in a Google Cloud Storage (GCS) bucket.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data Context.
- Configured an Expectations Suite.
- Configured a Google Cloud Platform (GCP) service account with credentials that can access the appropriate GCP resources, which include Storage Objects.
- Identified the GCP project, GCS bucket, and prefix where Expectations will be stored.
Steps
1. Configure your GCP credentials
Check that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored.
The Google Cloud Platform documentation describes how to verify your authentication for the Google Cloud API, which includes:
- Creating a Google Cloud Platform (GCP) service account,
- Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable, - Verifying authentication by running a simple Google Cloud Storage client library script.
2. Identify your Data Context Expectations Store
In your great_expectations.yml
, look for the following lines. The configuration tells Great Expectations to look for Expectations in a StoreA connector to store and retrieve information about metadata in Great Expectations. called expectations_store
. The base_directory
for expectations_store
is set to expectations/
by default.
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
3. Update your configuration file to include a new store for Expectations on GCS
In our case, the name is set to expectations_GCS_store
, but it can be any name you like. We also need to make some changes to the store_backend
settings. The class_name
will be set to TupleGCSStoreBackend
, project
will be set to your GCP project, bucket
will be set to the address of your GCS bucket, and prefix
will be set to the folder on GCS where Expectation files will be located.
If you are also storing Validations in GCS or DataDocs in GCS, please ensure that the prefix
values are disjoint and one is not a substring of the other.
stores:
expectations_GCS_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
prefix: <YOUR GCS PREFIX NAME>
expectations_store_name: expectations_GCS_store
4. Copy existing Expectation JSON files to the GCS bucket (This step is optional)
One way to copy Expectations into GCS is by using the gsutil cp
command, which is part of the Google Cloud SDK. The following example will copy one Expectation, my_expectation_suite
from a local folder to the GCS bucket. Information on other ways to copy Expectation JSON files, like the Cloud Storage browser in the Google Cloud Console, can be found in the Documentation for Google Cloud.
gsutil cp expectations/my_expectation_suite.json gs://<YOUR GCS BUCKET NAME>/<YOUR GCS PREFIX NAME>/my_expectation_suite.json
Operation completed over 1 objects
5. Confirm that the new Expectations store has been added
Run the following:
great_expectations store list
Only the active Stores will be listed. Great Expectations will look for Expectations in GCS as long as we set the expectations_store_name
variable to expectations_GCS_store
, and the config for expectations_store
can be removed if you would like.
- name: expectations_GCS_store
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
prefix: <YOUR GCS PREFIX NAME>
6. Confirm that Expectations can be accessed from GCS
To do this, run the following:
great_expectations suite list
If you followed Step 4, the output should include the Expectation we copied to GCS: my_expectation_suite
. If you did not copy Expectations to the new Store, you will see a message saying no Expectations were found.
1 Expectation Suite found:
- my_expectation_suite
Additional Notes
To view the full script used in this page, see it on GitHub: