-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
204 changed files
with
21,198 additions
and
2,414 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
.. _contributing: | ||
|
||
Contributing | ||
================== | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
Can I contribute? | ||
----------------- | ||
|
||
Absolutely. Yes, please. Start | ||
`here <https://github.com/great-expectations/great_expectations/blob/develop/CONTRIBUTING.md>`__, | ||
and don't be shy with questions! | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
.. _core_concepts: | ||
|
||
Core Concepts | ||
================== | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
/core_concepts/expectations | ||
/core_concepts/validation | ||
/core_concepts/data_context | ||
/core_concepts/datasource | ||
/core_concepts/custom_expectations | ||
/glossary |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
.. _data_context: | ||
|
||
Data Context | ||
=================== | ||
|
||
A DataContext represents a Great Expectations project. It organizes storage and access for | ||
expectation suites, datasources, notification settings, and data fixtures. | ||
|
||
The DataContext is configured via a yml file stored in a directory called great_expectations; the configuration file | ||
as well as managed expectation suites should be stored in version control. | ||
|
||
DataContexts use data sources you're already familiar with. Generators help introspect data stores and data execution | ||
frameworks (such as airflow, Nifi, dbt, or dagster) to describe and produce batches of data ready for analysis. This | ||
enables fetching, validation, profiling, and documentation of your data in a way that is meaningful within your | ||
existing infrastructure and work environment. | ||
|
||
DataContexts use a datasource-based namespace, where each accessible type of data has a three-part | ||
normalized *data_asset_name*, consisting of *datasource/generator/generator_asset*. | ||
|
||
- The datasource actually connects to a source of materialized data and returns Great Expectations DataAssets \ | ||
connected to a compute environment and ready for validation. | ||
|
||
- The Generator knows how to introspect datasources and produce identifying "batch_kwargs" that define \ | ||
particular slices of data. | ||
|
||
- The generator_asset is a specific name -- often a table name or other name familiar to users -- that \ | ||
generators can slice into batches. | ||
|
||
An expectation suite is a collection of expectations ready to be applied to a batch of data. Since | ||
in many projects it is useful to have different expectations evaluate in different contexts--profiling | ||
vs. testing; warning vs. error; high vs. low compute; ML model or dashboard--suites provide a namespace | ||
option for selecting which expectations a DataContext returns. | ||
|
||
In many simple projects, the datasource or generator name may be omitted and the DataContext will infer | ||
the correct name when there is no ambiguity. | ||
|
||
Similarly, if no expectation suite name is provided, the DataContext will assume the name "default". |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
.. _datasource: | ||
|
||
Datasources | ||
============ | ||
|
||
Datasources are responsible for connecting to data infrastructure. Each Datasource is a source | ||
of materialized data, such as a SQL database, S3 bucket, or local file directory. | ||
|
||
Each Datasource also provides access to Great Expectations data assets that are connected to | ||
a specific compute environment, such as a SQL database, a Spark cluster, or a local in-memory | ||
Pandas Dataframe. | ||
|
||
To bridge the gap between those worlds, Datasources interact closely with *generators* which | ||
are aware of a source of data and can produce produce identifying information, called | ||
"batch_kwargs" that datasources can use to get individual batches of data. They add flexibility | ||
in how to obtain data such as with time-based partitioning, downsampling, or other techniques | ||
appropriate for the datasource. | ||
|
||
For example, a generator could produce a SQL query that logically represents "rows in the Events | ||
table with a timestamp on February 7, 2012," which a SqlAlchemyDatasource could use to materialize | ||
a SqlAlchemyDataset corresponding to that batch of data and ready for validation. | ||
|
||
Since opinionated DAG managers such as airflow, dbt, prefect.io, dagster can also act as datasources | ||
and/or generators for a more generic datasource. | ||
|
||
See :ref:`batch_generator` for more detail about how batch generators interact with datasources and DAG runners. | ||
|
||
See datasource module docs :ref:`datasource_module` for more detail about available datasources. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.