Skip to main content
Version: 0.15.50

How to configure an Expectation Store to use Amazon S3

By default, newly ProfiledThe act of generating Metrics and candidate Expectations from data. ExpectationsA verifiable assertion about data. are stored as Expectation SuitesA collection of verifiable assertions about data. in JSON format in the expectations/ subdirectory of your great_expectations/ folder. This guide will help you configure Great Expectations to store them in an Amazon S3 bucket.

Prerequisites: This how-to guide assumes you have:

Steps

1. Install boto3 with pip

Python interacts with AWS through the boto3 library. Great Expectations makes use of this library in the background when working with AWS. Therefore, although you will not need to use boto3 directly, you will need to have it installed into your virtual environment.

You can do this with the pip command:

Terminal command
python -m pip install boto3

or

Terminal command
python3 -m pip install boto3

For more detailed instructions on how to set up boto3 with AWS, and information on how you can use boto3 from within Python, please reference boto3's documentation site.

2. Verify your AWS credentials are properly configured

If you have installed the AWS CLI, you can verify that your AWS credentials are properly configured by running the command:

Terminal command
aws sts get-caller-identity

If your credentials are properly configured, this will output your UserId, Account and Arn. If your credentials are not configured correctly, this will throw an error.

If an error is thrown, or if you were unable to use the AWS CLI to verify your credentials configuration, you can find additional guidance on configuring your AWS credentials by referencing Amazon's documentation on configuring the AWS CLI.

2. Identify your Data Context Expectations Store

You can find your Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data.'s configuration within your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..

In your great_expectations.yml file, look for the following lines:

File contents: great_expectations.yml
expectations_store_name: expectations_store

stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/

This configuration tells Great Expectations to look for Expectations in a store called expectations_store. The base_directory for expectations_store is set to expectations/ by default.

3. Update your configuration file to include a new Store for Expectations on S3

You can manually add an Expectations StoreA connector to store and retrieve information about collections of verifiable assertions about data. by adding the configuration shown below into the stores section of your great_expectations.yml file.

File contents: great_expectations.yml
stores:
expectations_S3_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'

To make the store work with S3 you will need to make some changes to default the store_backend settings, as has been done in the above example. The class_name should be set to TupleS3StoreBackend, bucket will be set to the address of your S3 bucket, and prefix will be set to the folder in your S3 bucket where Expectation files will be located.

Additional options are available for a more fine-grained customization of the TupleS3StoreBackend.

File contents: great_expectations.yml
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'

For the above example, please also note that the new Store's name is set to expectations_S3_store. This value can be any name you like as long as you also update the value of the expectations_store_name key to match the new Store's name.

File contents: great_expectations.yml
expectations_store_name: expectations_S3_store

This update to the value of the expectations_store_name key will tell Great Expectations to use the new Store for Expectations.

caution

If you are also storing Validations in S3 or DataDocs in S3, please ensure that the prefix values are disjoint and one is not a substring of the other.

5. Confirm that the new Expectations Store has been added

You can verify that your Stores are properly configured by running the command:

Terminal command
great_expectations store list

This will list the currently configured Stores that Great Expectations has access to. If you added a new S3 Expectations Store, the output should include the following ExpectationsStore entry:

Terminal output
- name: expectations_S3_store
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'

Notice the output contains only one Expectation Store: your configuration contains the original expectations_store on the local filesystem and the expectations_S3_store we just configured, but the great_expectations store list command only lists your active stores. For your Expecation Store, this is the one that you set as the value of the expectations_store_name variable in the configuration file: expectations_S3_store.

4. Copy existing Expectation JSON files to the S3 bucket (This step is optional)

If you are converting an existing local Great Expectations deployment to one that works in AWS you may already have Expectations saved that you wish to keep and transfer to your S3 bucket.

One way to copy Expectations into Amazon S3 is by using the aws s3 sync command. As mentioned earlier, the base_directory is set to expectations/ by default.

Terminal command
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'

In the example below, two Expectations, exp1 and exp2 are copied to Amazon S3. This results in the following output:

Terminal output
upload: ./exp1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp1.json
upload: ./exp2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp2.json

If you have Expectations to copy into S3, your output should look similar.

6. Confirm that Expectations can be accessed from Amazon S3 by running great_expectations suite list

If you followed the optional step to copy your existing Expectations to the S3 bucket, you can confirm that Great Expectations can find them by running the command:

Terminal input
great_expectations suite list

Your output should include the Expectations you copied to Amazon S3. In the example, these Expectations were stored in Expectation Suites named exp1 and exp2. This would result in the following output from the above command:

Terminal output
2 Expectation Suites found:
- exp1
- exp2

Your output should look similar, with the names of your Expectation Suites replacing the names from the example.

If you did not copy Expectations to the new Store, you will see a message saying no Expectations were found.