Skip to main content
Version: 0.14.13

How to create a new Expectation Suite by profiling from a jsonschema file

The JsonSchemaProfiler helps you quickly create Expectation SuitesA collection of verifiable assertions about data. from jsonschema files.

Prerequisites: This how-to guide assumes you have:
danger

This implementation does not traverse any levels of nesting.

Steps

1. Set a filename and a suite name

jsonschema_file = versioned_code/version-0.14.13/"YOUR_JSON_SCHEMA_FILE.json"
suite_name = "YOUR_SUITE_NAME"

2. Load a DataContext

context = ge.data_context.DataContext()

3. Load the jsonschema file

with open(jsonschema_file, "r") as f:
schema = json.load(f)

4. Instantiate the profiler

profiler = JsonSchemaProfiler()

5. Create the suite

suite = profiler.profile(schema, suite_name)

6. Save the suite

context.save_expectation_suite(suite)

7. (Optional) Generate Data Docs and review the results

Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. provides a concise and useful way to review the Expectation Suite that has been created.
context.build_data_docs()

You can also review and update the ExpectationsA verifiable assertion about data. created by the ProfilerGenerates Metrics and candidate Expectations from data. to get to the Expectation Suite you want using great_expectations suite edit.

Additional notes

info

Note that JsonSchemaProfiler generates Expectation Suites using column map Expectations, which assumes a tabular data structure, because Great Expectations does not currently support nested data structures.

The full example script is here:

import json
import great_expectations as ge
from great_expectations.profile.json_schema_profiler import JsonSchemaProfiler

jsonschema_file = versioned_code/version-0.14.13/"YOUR_JSON_SCHEMA_FILE.json"
suite_name = "YOUR_SUITE_NAME"

context = ge.data_context.DataContext()

with open(jsonschema_file, "r") as f:
raw_json = f.read()
schema = json.loads(raw_json)

print("Generating suite...")
profiler = JsonSchemaProfiler()
suite = profiler.profile(schema, suite_name)
context.save_expectation_suite(suite)