Skip to main content
Version: 0.16.16

How to request data from a Data Asset

This guide demonstrates how you can request data from a Datasource that has been defined with the context.sources.add_* method.

Prerequisites

  • An installation of GX
  • A Datasource with a configured Data Asset

Steps

1. Import GX and instantiate a Data Context

The code to import Great Expectations and instantiate a Data Context is:

import great_expectations as gx

context = gx.get_context()

2. Retrieve your Data Asset

If you already have an instance of your Data Asset stored in a Python variable, you do not need to retrieve it again. If you do not, you can instantiate a previously defined Datasource with your Data Context's get_datasource(...) method. Likewise, a Datasource's get_asset(...) method will instantiate a previously defined Data Asset.

In this example we will use a previously defined Datasource named my_datasource and a previously defined Data Asset named my_asset.

my_asset = context.get_datasource("my_datasource").get_asset("my_asset")

3. (Optional) Build an options dictionary for your Batch Request

An options dictionary can be used to limit the Batches returned by a Batch Request. Omitting the options dictionary will result in all available Batches being returned.

The structure of the options dictionary will depend on the type of Data Asset being used. The valid keys for the options dictionary can be found by checking the Data Asset's batch_request_options property.

print(my_asset.batch_request_options)

The batch_request_options property is a tuple that contains all the valid keys that can be used to limit the Batches returned in a Batch Request.

You can create a dictionary of keys pulled from the batch_request_options tuple and values that you want to use to specify the Batch or Batches your Batch Request should return, then pass this dictionary in as the options parameter when you build your Batch Request.

4. Build your Batch Request

We will use the build_batch_request(...) method of our Data Asset to generate a Batch Request.

my_batch_request = my_asset.build_batch_request()

For dataframe Data Assets, the dataframe is always specified as the argument of exactly one API method:

my_batch_request = my_asset.build_batch_request(dataframe=dataframe)

5. Verify that the correct Batches were returned

The get_batch_list_from_batch_request(...) method will return a list of the Batches a given Batch Request refers to.

batches = my_asset.get_batch_list_from_batch_request(my_batch_request)

Because Batch definitions are quite verbose, it is easiest to determine what data the Batch Request will return by printing just the batch_spec of each Batch.

for batch in batches:
print(batch.batch_spec)

Next steps

Now that you have a retrieved data from a Data Asset, you may be interested in creating Expectations about your data: