How to configure an Expectation Store to use Amazon S3
By default, new ProfiledThe act of generating Metrics and candidate Expectations from data. ExpectationsA verifiable assertion about data. are stored as Expectation SuitesA collection of verifiable assertions about data. in JSON format in the expectations/
subdirectory of your great_expectations/
folder. Use the information provided here to configure a new storage location for Expectations in an Amazon S3 bucket.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- Permissions to install boto3 in your local environment.
- An S3 bucket and prefix to store Expectations.
1. Install boto3 with pip
Python interacts with AWS through the boto3
library. Great Expectations makes use of this library in the background when working with AWS. Although you won't use boto3
directly, you'll need to install it in your virtual environment.
Run one of the following pip commands to install boto3
in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
To set up boto3 with AWS, and use boto3
from within Python, see the Boto3 documentation.
2. Verify your AWS credentials are properly configured
Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured, your UserId
, Account
and Arn
are returned. If your credentials are not configured correctly, an error message appears.
If an error message appears, or if you couldn't use the AWS CLI to verify your credentials configuration, see Configuring the AWS CLI.
3. Identify your Data Context Expectations Store
Your Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..
The following section in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. great_expectations.yml
file tells Great Expectations to look for Expectations in a Store named expectations_store
:
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
The default base_directory
for expectations_store
is expectations/
.
4. Update your configuration file to include a new Store for Expectations
To manually add an Expectations StoreA connector to store and retrieve information about collections of verifiable assertions about data. to your configuration, add the following configuration to the stores
section of your great_expectations.yml
file:
stores:
expectations_S3_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: <your>
prefix: <your>
expectations_store_name: expectations_S3_store
As shown in the previous example, you need to change the default store_backend
settings to make the Store work with S3. The class_name
is set to TupleS3StoreBackend
, bucket
is the address of your S3 bucket, and prefix
is the folder in your S3 bucket where Expectations are located.
The following example shows the additional options that are available to customize TupleS3StoreBackend
:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'
In the previous example, the Store name is expectations_S3_store
. If you use a personalized Store name, you must also update the value of the expectations_store_name
key to match the Store name. For example:
expectations_store_name: expectations_S3_store
When you update the expectations_store_name
key value, Great Expectations uses the new Store for Validation Results.
Add the following code to great_expectations.yml
to configure the IAM user:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
aws_session_token: ${AWS_ACCESS_KEY_ID}
Add the following code to great_expectations.yml
to configure the IAM Assume Role:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
assume_role_arn: '<your_role_to_assume>'
region_name: '<your_aws_region_name>'
assume_role_duration: session_duration_in_seconds
If you are also storing Validations in S3 or DataDocs in S3, make sure that the prefix
values are disjoint and one is not a substring of the other.
5. Copy existing Expectation JSON files to the S3 bucket (Optional)
If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Expectations saved that you want to transfer to your S3 bucket.
To copy Expectations into Amazon S3, use the aws s3 sync
command as shown in the following example:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'
The base_directory
is set to expectations/
by default.
In the following example, the Expectations exp1
and exp2
are copied to Amazon S3 and a confirmation message is returned:
upload: ./exp1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp1.json
upload: ./exp2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp2.json
6. Confirm Expectation Suite availability
If you copied your existing Expectation Suites to the S3 bucket, run the following Python code to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
The Expectations you copied to S3 are returned as a list. Expectations that weren't copied to the new Store aren't listed.