All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Relax arrow-pd-parser version requirement
- Update dependencies
- Updates pypi action for Trusted Publishers
- Allows python 3.11 and beyond
- Updated parquet validator to read schemas from S3
- Added parquet validator for validating parquet schemas
- Resolved mojap package dependency conflicts.
- Updated
bin_pack_config
to use a multithreaded function for fetching file content lengths from s3
- Add optional
pandas_kwargs
param to the pandas validator, to allow passing of pandas keyword arguments
- fixed issue with release version numbering
- all files went to fail regardless
- temp fs was never deleted causing issues on failed runs
- fixed issue with type category not being present in metadata when required
- fixed case sensitivity for metadata with capitalised col names
- removed
only-test-columns-in-metadata
from config and replaced it with two (allow-unexpected-data
andallow-missing-cols
) that allow there to be some misallignment between the meta and the data - if there is no commanilty between the data, then an error is raised regardless of the two new mitigations
- when an exception is raised during linting, data contiues to be processed so that all errors are apparent at the end of linting
- updated dependancies to be in line with other mojap packages
- updated CI/CD tests
- updated schema and config parsing to allow for underscores or hypens used in parameter names
- fixed issue #164 where col validation function wrapper was misclassifying str cols as not strings.
- fixed issue with capitalisation of data in the header of source (csv) data not allowing the casting of timestamps correctly by using new pandas parser from arrow-pd-parser
- enum set removed from response dict for enum test to stop logs from being filled with potentially long enum sets
- temporary storage is now removed at the beginning of a run as leftover files would remain after a failed run and potentially cause errors
- Added automatic pypi release GH workflow
- Fixed typo in test that wasn't running
- Fixed typo on package version number
- Dropping git repo references for pypi release
- Updated github repo dependencies in
pyproject.toml
- Created Pandas Validator (now replaces frictionless as the default validator) (issue #120, issue #98)
- Enabled parallelisation of validators (#122)
- Migrated to the new metadata schemas (#140)
- Single process validator can run locally (#121)
- Full log now writes to it's own folder (#130)
- Renamed default branch of repo from master -> main
[ALL CLOSED ISSUES]
- issue #140
- issue #139
- issue #133
- issue #132
- issue #131
- issue #130
- issue #129
- issue #128
- issue #125
- issue #122
- issue #121
- issue #120
- issue #110
- issue #100
- issue #98
- issue #87
- issue #70
- Add pandas-kwargs to table params to pass through to Great Expectations (#112)
- Update dependencies
- Split out the codebase to make it easier to add new validators. These have to conform to the validator base class
data_linter/validators/base.py
. (#101) - Added a great expectations validator. (#103)
- Suprise! Reverted back to frictionless (from the previous revert in v3 to goodtables) (#102)
- Improved logging now log up to level INFO is written to standard out and level DEBUG to S3 log.
- Added ability for user to define how data is written to S3. Specifically if you want it to be partitioned by timestamp or not.
- Dropping the
v
from our releases.
- Revert back to goodtables
- Fix log_path being called before assignment
- Upgrade from
goodtables
tofrictionless
package (#57) - (Hopefully) address aws read timeout issue (#79)
- Review printing and logging (#86)
- Separating out functionality so that users can provide a config stored in memory rather than a file (#84)
- Add option for defining the timestamp partition name (#85)
- Minor logic change when all_must_pass is set to True
- improved logging
- Extend read_timeout to hopefully avoid more timeouts
- Fix typo in get_out_path function
- Added some more print statements (#75)
- Fixed testing suite (#76)
- Fixed dependency tests (#47)
- Add print statements to supplement logging, so you can see it working in real time
- Fix logic of main script (#52)
- Add support for upper case headers (#53)
- Better handling of missing values for jsonl (#61)
- Fix command line tool (#45)
- Add flake8 linting Github Action (#58)
- Actually compresses data when
compress-data = true
(#35)
- Initial release, repurposed repo to use Goodtables