Integrating Great Expectations (GE) for the BigQuery Data page
Prerequisites
Ensure you have an Unravel instance up and running.
Have a GE test suite ready. If not, set it up following the instructions from the Great Expectations team. For BigQuery, follow How to connect to a BigQuery database | Great Expectations or Use Great Expectations with Google Cloud Platform and BigQuery | Great Expectations
Install the Unravel contrib package for Great Expectations.
Have a working Python installation.
Integrating Great Expectations (BigQuery)
Run the following steps to integrate GE for BigQuery Data page.
Setup Python Environment.
Download the
.whl
file for the Unravel contrib package.Open a terminal window and navigate to the directory where the .whl file is located.
Run the following command to install the Unravel contrib package:
pip install <file_name.whl>
Configure Unravelaction.
The unravelaction has the capability to extract events out of the expectation failures generated by the GE test suite.
In the Python file where the checkpoint is defined, import
unravelaction
:import unravelaction
Configure unravelaction to the action_list configuration in the GE checkpoint YAML configuration.
action_list: - name: UnravelAction action: class_name: UnravelAction module_name: unravelaction lr_url: "{lr_url}" lr_version: "v2" index: "events_bq_t1-"
The configurable parameters are as follows:
name
Action name for storing validation results.
class_name
Set it to UnravelAction.
module_name
Set it to unravelaction.
lr_url
URL to the log receiver.
lr_version
The version of the log receiver.
index
The Index name where the GE event will be stored.
Run the Great Expectations checkpoint.
To validate the data and send the expectation failures to unravel, you need to run the checkpoint. You can check the great expectations guide on different ways to run a checkpoint.