Skip to main content
Version: 0.16.16

How to quickly connect to a single file using Pandas

In this guide we will demonstrate how to use Pandas to connect to data stored in files on a filesystem. In this example we will specifically be connecting to data in .csv format. However, GX supports most read methods available through Pandas.

Prerequisites

Steps

1. Import the Great Expectations module and instantiate a Data Context

The code to import Great Expectations and instantiate a Data Context is:

import great_expectations as gx

context = gx.get_context()

2. Specify a file to read into a Data Asset

Great Expectations supports reading the data in individual files directly into a Validator using Pandas. To do this, we will run the code:

validator = context.sources.pandas_default.read_csv(
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)
Using Pandas to connect to different file types

In this example, we are connecting to a csv file. However, Great Expectations supports connecting to most types of files that Pandas has read_* methods for.

Because you will be using Pandas to connect to these files, the specific add_*_asset methods that will be available to you will be determined by your currently installed version of Pandas.

For more information on which Pandas read_* methods are available to you as add_*_asset methods, please reference the official Pandas Input/Output documentation for the version of Pandas that you have installed.

In the GX Python API, add_*_asset methods will require the same parameters as the corresponding Pandas read_* method, with one caveat: In Great Expectations, you will also be required to provide a value for an asset_name parameter.

Next steps

Now that you have a Validator, you can immediately move on to creating Expectations. For more information, please see: