How to request data from a Data Asset
This guide demonstrates how you can request data from a Datasource that has been defined with the context.sources.add_*
method.
Prerequisites
- An installation of GX
- A Datasource with a configured Data Asset
Steps
1. Import GX and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Retrieve your Data Asset
If you already have an instance of your Data Asset stored in a Python variable, you do not need to retrieve it again. If you do not, you can instantiate a previously defined Datasource with your Data Context's get_datasource(...)
method. Likewise, a Datasource's get_asset(...)
method will instantiate a previously defined Data Asset.
In this example we will use a previously defined Datasource named my_datasource
and a previously defined Data Asset named my_asset
.
my_asset = context.get_datasource("my_datasource").get_asset("my_asset")
3. (Optional) Build an options
dictionary for your Batch Request
An options
dictionary can be used to limit the Batches returned by a Batch Request. Omitting the options
dictionary will result in all available Batches being returned.
The structure of the options
dictionary will depend on the type of Data Asset being used. The valid keys for the options
dictionary can be found by checking the Data Asset's batch_request_options
property.
print(my_asset.batch_request_options)
The batch_request_options
property is a tuple that contains all the valid keys that can be used to limit the Batches returned in a Batch Request.
You can create a dictionary of keys pulled from the batch_request_options
tuple and values that you want to use to specify the Batch or Batches your Batch Request should return, then pass this dictionary in as the options
parameter when you build your Batch Request.
4. Build your Batch Request
We will use the build_batch_request(...)
method of our Data Asset to generate a Batch Request.
my_batch_request = my_asset.build_batch_request()
For dataframe
Data Assets, the dataframe
is always specified as the argument of exactly one API method:
my_batch_request = my_asset.build_batch_request(dataframe=dataframe)
5. Verify that the correct Batches were returned
The get_batch_list_from_batch_request(...)
method will return a list of the Batches a given Batch Request refers to.
batches = my_asset.get_batch_list_from_batch_request(my_batch_request)
Because Batch definitions are quite verbose, it is easiest to determine what data the Batch Request will return by printing just the batch_spec
of each Batch.
for batch in batches:
print(batch.batch_spec)
Next steps
Now that you have a retrieved data from a Data Asset, you may be interested in creating Expectations about your data: