How to configure an Expectation Store to use GCS
By default, newl ProfiledThe act of generating Metrics and candidate Expectations from data. ExpectationsA verifiable assertion about data. are stored as Expectation SuitesA collection of verifiable assertions about data. in JSON format in the expectations/
subdirectory of your great_expectations/
folder. Use the information provided here to configure a new storage location for Expectations in Google Cloud Storage (GCS).
To view all the code used in this topic, see how_to_configure_an_expectation_store_in_gcs.py.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A GCP service account with credentials that allow access to GCP resources such as Storage Objects.
- A GCP project, GCS bucket, and prefix to store Expectations.
1. Configure your GCP credentials
Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored. This includes the following:
- A GCP service account.
- Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable. - Verifying authentication by running a Google Cloud Storage client library script.
For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.
2. Identify your Data Context Expectations Store
The configuration for your Expectations StoreA connector to store and retrieve information about metadata in Great Expectations. is available in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.yml
and find the following entry:
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
This configuration tells Great Expectations to look for Expectations in the expectations_store
Store. The default base_directory
for expectations_store
is expectations/
.
3. Update your configuration file to include a new store for Expectations
In the following example, expectations_store_name
is set to expectations_GCS_store
, but it can be personalized. You also need to change the store_backend
settings. The class_name
is TupleGCSStoreBackend
, project
is your GCP project, bucket
is the address of your GCS bucket, and prefix
is the folder on GCS where Expectations are stored.
stores:
expectations_GCS_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
expectations_store_name: expectations_GCS_store
If you are also storing Validations in GCS or DataDocs in GCS, make sure that the prefix
values are disjoint and one is not a substring of the other.
4. Copy existing Expectation JSON files to the GCS bucket (Optional)
Use the gsutil cp
command to copy Expectations into GCS. For example, the following command copies the Expectation `my_expectation_suite
from a local folder into a GCS bucket:
gsutil cp expectations/my_expectation_suite.json gs://<your>/<your>/my_expectation_suite.json
The following confirmation message is returned:
Operation completed over 1 objects
Additional methods for copying Expectations into GCS are available. See Upload objects from a filesystem.
5. Confirm that the new Expectation Suites have been added
If you copied your existing Expectation Suites to GCS, run the following Python command to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
A list of Expectation Suites you copied to GCS is returned. Expectation Suites that weren't copied to the new Store aren't listed.
6. Confirm that Expectations can be accessed from GCS
Run the following command to confirm your Expectations were copied to GCS:
great_expectations suite list
If your Expectations were not copied to Azure Blob Storage, a message indicating no Expectations were found is returned.