dataculpa-snowflake

Snowflake connector for Data Culpa to monitor ongoing quality and consistency metrics in Snowflake tables and views

This connector conforms to the Data Culpa connector template.

Pipeline Instances

Clone the repo (or just sfdatalake.py)
Install python dependencies (python3):

pip install python-dotenv snowflake-connector-python dataculpa-client

Create a .env file with the following keys:

# API key to access the storage.
DC_CONTROLLER_SECRET = secret-here   # Create a new API secret in the Data Culpa Validator UI
SNOWFLAKE_PASSWORD = secret-here

Run sfdatalake.py --init example.yaml to generate a template yaml to fill in connection coordinates. Note that we always keep secrets in the .env and not the yaml, so that the yaml file will be safe to check into source control or otherwise distribute in your organization, etc.
Once you have your yaml file edited, run sfdatalake.py --test example.yaml to test the connections to the database and the Data Culpa Validator controller.
(You can also run sfdatalake.py --discover example.yaml to see what tables are discoverable for walking with the connector. Snowflake permissions may impact visibility here.)

Invocation

The sfdatalake.py script is intended to be invoked from cron or other orchestration systems. You can run it as frequently as you wish; you can spread out instances to isolate collections or different databases with different yaml configuration files. You can also ingest from a replica, snapshot, or backup of data to reduce impact on production environments.

Future Improvements

There are many improvements we are considering for this module. You can get in touch by writing to hello@dataculpa.com or opening issues in this repository.

SaaS deployment

Our hosted SaaS includes Snowflake and other connectors and a GUI for configuration. If you'd like to try it out, drop a line to hello@dataculpa.com.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitattributes		.gitattributes
README.md		README.md
sfdatalake.py		sfdatalake.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dataculpa-snowflake

Pipeline Instances

Invocation

Future Improvements

SaaS deployment

About

Releases

Packages

Languages

Data-Culpa/dataculpa-snowflake

Folders and files

Latest commit

History

Repository files navigation

dataculpa-snowflake

Pipeline Instances

Invocation

Future Improvements

SaaS deployment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages