Releases · great-expectations/great_expectations

18 Sep 22:50

jcampbell

v0.7.9

bfa7b0b

v0.7.9

Add an S3 generator, which will introspect a configured bucket and generate batch_kwargs from identified objects
Add support to PandasDatasource and SparkDFDatasource for reading directly from S3
Enhance the Site Index page in documentation so that validation results are sorted and display the newest items first when using the default run-id scheme
Add a new utility method, build_continuous_partition_object which will build partition objects using the dataset API and so supports any GE backend.
Fix an issue where columns with spaces in their names caused failures in some SqlAlchemyDataset and SparkDFDataset expectations
Fix an issue where generated queries including null checks failed on MSSQL (#695)
Fix an issue where evaluation parameters passed in as a set instead of a list could cause JSON serialization problems for the result object (#699)

Assets 2

04 Sep 15:27

jcampbell

v0.7.8

17692a1

v0.7.8

BREAKING: slack webhook URL now must be in the profiles.yml file (treat as a secret)
Profiler improvements:
- Display candidate profiling data assets in alphabetical order
- Add columns to the expectation_suite meta during profiling to support human-readable description information
Improve handling of optional dependencies during CLI init
Improve documentation for create_expectations notebook
Fix several anachronistic documentation and docstring phrases (#659, #660, #668, #681; #thanks @StevenMMortimer)
Fix data docs rendering issues:
- documentation rendering failure from unrecognized profiled column type (#679; thanks @dinedal))
- PY2 failure on encountering unicode (#676)

Assets 2

19 Aug 19:04

jcampbell

v0.7.7

37ed5d6

v0.7.7

Standardize the way that plugin module loading works. DataContext will begin to use the new-style class and plugin identification moving forward; yml configs should specify class_name and module_name (with module_name optional for GE types). For now, it is possible to use the "type" parameter in configuration (as before).
Add support for custom data_asset_type to all datasources
Add support for strict_min and strict_max to inequality-based expectations to allow strict inequality checks
(thanks @RoyalTS!)
Add support for reader_method = "delta" to SparkDFDatasource
Fix databricks generator (thanks @sspitz3!)
Improve performance of DataContext loading by moving optional import
Fix several memory and performance issues in SparkDFDataset.
- Use only distinct value count instead of bringing values to driver
- Migrate away from UDF for set membership, nullity, and regex expectations
Fix several UI issues in the data_documentation
- Move prescriptive dataset expectations to Overview section
- Fix broken link on Home breadcrumb
- Scroll follows navigation properly
- Improved flow for long items in value_set
- Improved testing for ValidationRenderer
- Clarify dependencies introduced in documentation sites
- Improve testing and documentation for site_builder, including run_id filter
- Fix missing header in Index page and cut-off tooltip
- Add run_id to path for validation files

Assets 2

12 Aug 21:58

jcampbell

v0.7.6

178a511

v0.7.6

New Validation Renderer! Supports turning validation results into HTML and displays differences between the expected and the observed attributes of a dataset.
Data Documentation sites are now fully configurable; a data context can be configured to generate multiple
sites built with different GE objects to support a variety of data documentation use cases. See data documentation guide for more detail.
CLI now has a new top-level command, build-documentation that can support rendering documentation for specified sites and even named data assets in a specific site.
Introduced DotDict and LooselyTypedDotDict classes that allow to enforce typing of dictionaries.
Bug fixes: improved internal logic of rendering data documentation, slack notification, and CLI profile command when datasource argument was not provided.

Assets 2

03 Aug 01:41

jcampbell

v0.7.5

b5db97b

v0.7.5

Fix missing requirement for pypandoc brought in from markdown support for notes rendering.

Assets 2

03 Aug 01:31

jcampbell

v0.7.4

29934d9

v0.7.4

Fix numerous rendering bugs and formatting issues for rendering documentation.
Add support for pandas extension dtypes in pandas backend of expect_column_values_to_be_of_type and expect_column_values_to_be_in_type_list and fix bug affecting some dtype-based checks.
Add datetime and boolean column-type detection in BasicDatasetProfiler.
Improve BasicDatasetProfiler performance by disabling interactive evaluation when output of expectation is not immediately used for determining next expectations in profile.
Add support for rendering expectation_suite and expectation_level notes from meta in docs.
Fix minor formatting issue in readthedocs documentation.

Assets 2

29 Jul 15:34

jcampbell

v0.7.3

8d87f42

v0.7.3

BREAKING: Harmonize expect_column_values_to_be_of_type and expect_column_values_to_be_in_type_list semantics in
Pandas with other backends, including support for None type and type_list parameters to support profiling.
These type expectations now rely exclusively on native python or numpy type names.
Add configurable support for Custom DataAsset modules to DataContext
Improve support for setting and inheriting custom data_asset_type names
Add tooltips with expectations backing data elements to rendered documentation
Allow better selective disabling of tests (thanks @RoyalITS)
Fix documentation build errors causing missing code blocks on readthedocs
Update the parameter naming system in DataContext to reflect data_asset_name and expectation_suite_name
Change scary warning about discarding expectations to be clearer, less scary, and only in log
Improve profiler support for boolean types, value_counts, and type detection
Allow user to specify data_assets to profile via CLI
Support CLI rendering of expectation_suite and EVR-based documentation

Assets 2

22 Jul 17:45

jcampbell

v0.7.2

4961301

v0.7.2

Improved error detection and handling in CLI "add datasource" feature
Fixes in rendering of profiling results (descriptive renderer of validation results)
Query Generator of SQLAlchemy datasource adds tables in non-default schemas to the data asset namespace
Added convenience methods to display HTML renderers of sections in Jupyter notebooks
Implemented prescriptive rendering of expectations for most expectation types

Assets 2

13 Jul 00:26

jcampbell

v0.7.1

f932985

v0.7.1

v.0.7.1

Added documentation/tutorials/videos for onboarding and new profiling and documentation features
Added prescriptive documentation built from expectation suites
Improved index, layout, and navigation of data context HTML documentation site
Bug fix: non-Python files were not included in the package
Improved the rendering logic to gracefully deal with failed expectations
Improved the basic dataset profiler to be more resilient
Implement expect_column_values_to_be_of_type, expect_column_values_to_be_in_type_list for SparkDFDataset
Updated CLI with a new documentation command and improved profile and render commands
Expectation suites and validation results within a data context are saved in a more readable form (with indentation)
Improved compatibility between SparkDatasource and InMemoryGenerator
Optimization for Pandas column type checking
Optimization for Spark duplicate value expectation (thanks @orenovadia!)
Default run_id format no longer includes ":" and specifies UTC time
Other internal improvements and bug fixes

Assets 2

04 Jul 03:28

jcampbell

v0.7.0

c03bf36

v0.7.0

Version 0.7 of Great Expectations is HUGE. It introduces several major new features
and a large number of improvements, including breaking API changes.

The core vocabulary of expectations remains consistent. Upgrading to
the new version of GE will primarily require changes to code that
uses data contexts; existing expectation suites will require only changes
to top-level names.

Major update of Data Contexts. Data Contexts now offer significantly
more support for building and maintaining expectation suites and
interacting with existing pipeline systems, including providing a namespace for objects.
They can handle integrating, registering, and storing validation results, and
provide a namespace for data assets, making batches first-class citizens in GE.
Read more: :ref:data_context or :py:mod:great_expectations.data_context
Major refactor of autoinspect. Autoinspect is now built around a module
called "profile" which provides a class-based structure for building
expectation suites. There is no longer a default "autoinspect_func" --
calling autoinspect requires explicitly passing the desired profiler. See :ref:profiling
New "Compile to Docs" feature produces beautiful documentation from expectations and expectation
validation reports, helping keep teams on the same page.
Name clarifications: we've stopped using the overloaded terms "expectations
config" and "config" and instead use "expectation suite" to refer to a
collection (or suite!) of expectations that can be used for validating a
data asset.
- Expectation Suites include several top level keys that are useful
  for organizing content in a data context: data_asset_name,
  expectation_suite_name, and data_asset_type. When a data_asset is
  validated, those keys will be placed in the meta key of the
  validation result.
Major enhancement to the CLI tool including init, render and more flexibility with validate
Added helper notebooks to make it easy to get started. Each notebook acts as a combination of
tutorial and code scaffolding, to help you quickly learn best practices by applying them to
your own data.
Relaxed constraints on expectation parameter values, making it possible to declare many column
aggregate expectations in a way that is always "vacuously" true, such as
expect_column_values_to_be_between None and None. This makes it possible to progressively
tighten expectations while using them as the basis for profiling results and documentation.
Enabled caching on dataset objects by default.
Bugfixes and improvements:
- New expectations:
  - expect_column_quantile_values_to_be_between
  - expect_column_distinct_values_to_be_in_set
- Added support for head method on all current backends, returning a PandasDataset
- More implemented expectations for SparkDF Dataset with optimizations
  - expect_column_values_to_be_between
  - expect_column_median_to_be_between
  - expect_column_value_lengths_to_be_between
- Optimized histogram fetching for SqlalchemyDataset and SparkDFDataset
- Added cross-platform internal partition method, paving path for improved profiling
- Fixed bug with outputstrftime not being honored in PandasDataset
- Fixed series naming for column value counts
- Standardized naming for expect_column_values_to_be_of_type
- Standardized and made explicit use of sample normalization in stdev calculation
- Added from_dataset helper
- Internal testing improvements
- Documentation reorganization and improvements
- Introduce custom exceptions for more detailed error logs

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v.0.7.1

Releases: great-expectations/great_expectations

v0.7.9

v0.7.8

v0.7.7

v0.7.6

v0.7.5

v0.7.4

v0.7.3

v0.7.2

v0.7.1

v.0.7.1

v0.7.0