How to configure a Validation Result Store in Azure Blob Storage
By default, Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. are stored in JSON format in the uncommitted/validations/
subdirectory of your great_expectations/
folder. Since Validation Results may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. This guide will help you configure a new storage location for Validation Results in Azure Blob Storage.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data Context.
- Configured an Expectations Suite.
- Configured a Checkpoint.
- Configured an Azure Storage account and get the connection string.
- Create the Azure Blob container. If you also wish to host and share Data Docs on Azure Blob Storage then you may set up this first and then use the
$web
existing container to store your ExpectationsA verifiable assertion about data.. - Identify the prefix (folder) where Validation Results will be stored (you don't need to create the folder, the prefix is just part of the Blob name).
Steps
1. Configure the config_variables.yml
file with your Azure Storage credentials
We recommend that Azure Storage credentials be stored in the config_variables.yml
file, which is located in the uncommitted/
folder by default, and is not part of source control. The following lines add Azure Storage credentials under the key AZURE_STORAGE_CONNECTION_STRING
. Additional options for configuring the config_variables.yml
file or additional environment variables can be found here.
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
2. Identify your Validation Results Store
As with all StoresA connector to store and retrieve information about metadata in Great Expectations., you can find the configuration for your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. through your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. In your great_expectations.yml
, look for the following lines. The configuration tells Great Expectations to look for Validation Results in a store called validations_store
. The base_directory
for validations_store
is set to uncommitted/validations/
by default.
validations_store_name: validations_store
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
3. Update your configuration file to include a new Store for Validation Results on Azure Storage account
In our case, the name is set to validations_AZ_store
, but it can be any name you like. We also need to make some changes to the store_backend
settings. The class_name
will be set to TupleAzureBlobStoreBackend
, container
will be set to the name of your blob container (the equivalent of S3 bucket for Azure) you wish to store your Validation Results, prefix
will be set to the folder in the container where Validation Result files will be located, and connection_string
will be set to ${AZURE_STORAGE_CONNECTION_STRING}
, which references the corresponding key in the config_variables.yml
file.
validations_store_name: validations_AZ_store
stores:
validations_AZ_store:
class_name: ValidationsStore
store_backend:
class_name: TupleAzureBlobStoreBackend
container: <blob-container>
prefix: validations
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
If the container is called $web
(for hosting and sharing Data Docs on Azure Blob Storage) then set container: \$web
so the escape char will allow us to reach the $web
container.
4. Copy existing Validation Results JSON files to the Azure blob (This step is optional)
One way to copy Validation Results into Azure Blob Storage is by using the az storage blob upload
command, which is part of the Azure SDK. The following example will copy one Validation Result from a local folder to the Azure blob. Information on other ways to copy Validation Result JSON files, like the Azure Storage browser in the Azure Portal, can be found in the Documentation for Azure.
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
az storage blob upload -f <local/path/to/validation.json> -c <GREAT-EXPECTATION-DEDICATED-AZURE-BLOB-CONTAINER-NAME> -n <PREFIX>/<validation.json>
example with a validation related to the exp1 expectation:
az storage blob upload -f great_expectations/uncommitted/validations/exp1/20210306T104406.877327Z/20210306T104406.877327Z/8313fb37ca59375eb843adf388d4f882.json -c <blob-container> -n validations/exp1/20210306T104406.877327Z/20210306T104406.877327Z/8313fb37ca59375eb843adf388d4f882.json
Finished[#############################################################] 100.0000%
{
"etag": "\"0x8D8E09F894650C7\"",
"lastModified": "2021-03-06T12:58:28+00:00"
}
5. Confirm that the new Validation Results Store has been added by running great_expectations store list
Notice the output contains two Validation stores: the original validations_store
on the local filesystem and the validations_AZ_store
we just configured. This is ok, since Great Expectations will look for Validation Results in Azure Blob as long as we set the validations_store_name
variable to validations_AZ_store
, and the config for validations_store
can be removed if you would like.
great_expectations store list
- name: validations_store
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
- name: validations_AZ_store
class_name: ValidationsStore
store_backend:
class_name: TupleAzureBlobStoreBackend
connection_string: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
container: <blob-container>
prefix: validations
6. Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store on Azure Blob then visualize the results by re-building Data Docs.