Skip to content

Latest commit

 

History

History
57 lines (56 loc) · 8.55 KB

changelog.md

File metadata and controls

57 lines (56 loc) · 8.55 KB

Changelog

Version Change
2023.9.0 Adding Table.match operation.
2023.8.0 Nim backend for csv importer.
Improve excel importer.
Improve slicing consistency.
Logical cores re-enabled on *nix based systems.
Filter is now type safe.
Added merge utility.
Various bugfixes.
2023.6.5 Fix issues with get_headers falling back to text reading when reading 0 lines of excel, fix issue where reading excel file would ignore file count, excel file reader now has parity for linecount selection.
2023.6.4 Fix a logic bug in get_headers that caused one extra line to be returned than requested.
2023.6.3 Updated the way reference counting works. Tablite now tracks references to used pages and cleans them up based on number of references to those pages in the current process. This change allows to handle deep table clones when sending tables via processes (pickling/unpickling), whereas previous implementation would corrupt all tables using same pages due to reference counting asserting that all tables are shallow copies to the same object.
2023.6.2 Updated mplite dependency, changed to soft version requirement to prevent pipeline freezes due to small bugfixes in mplite.
2023.6.1 Major change of the backend processes. Speed up of ~6x. For more see the release notes
2022.11.19 Fixed some memory leaks.
2022.11.18 copy, filter, sort, any, all methods now properly respects the table subclass.
Filter for tables with under SINGLE_PROCESSING_LIMIT rows will run on same process to reduce overhead.
Errors within child processes now properly propagate to parent.
Table.reset_storage(include_imports=True) now allows the user to reset the storage but exclude any imported files by setting include_imports=False during Table.reset(...).
Bug: A column with 1,None,2 would be written to csv & tsv as "1,None,2". Now it is written "1,,2" where None means absent.
Fix mp join producing mismatched columns lengths when different table lengths are used as an input or when join product is longer than the input table.
2022.11.17 Table.load now properly subclassess the table instead of always resulting in tablite.Table.
Table.from_* methods now respect subclassess, fixed some from_* methods which were instance methods and not class methods.
Fixed Table.from_dict only accepting list and tuple but not tablite.Column which is an equally valid type.
Fix lookup parity in single process and multiple process outputs.
Fix an issue with multiprocess lookup where no matches would throw instead of producing None.
Fix an issue with filtering an empty table.
2022.11.16 Changed join to process 1M rows per task to avoid potential OOM on lower memory systems.
Added mp_merge_columns to MemoryManager that merges column pages into a single column.
Fix join parity in single process and multiple process outputs.
Fix an issue with multiprocess join where no matches would throw instead of producing None.
2022.11.15 Bump mplite to avoid deadlock issues OS kill the process.
2022.11.14 Improve locking mechanism to allow retries when opening file as the previous solution could cause deadlocks when running multiple threads.
2022.11.13 Fix an issue with copying empty pages.
2022.11.12 Tablite now is now able to create it's own temporary directory.
2022.11.11 text_reader tqdm tracks the entire process now.
text_reader properly respects free memory in *nix based systems.
text_reader no longer discriminates against hyperthreaded cores.
2022.11.10 get_headers now uses plain openpyxl instead of pyexcel wrapper to speed up fetch times ~10x on certain files.
2022.11.9 get_headers can fail safe on unrecognized characters.
2022.11.8 Fix a bug with task size calculation on single core systems.
2022.11.7 Added TABLITE_TMPDIR environment variable for setting tablite work directory.
Characters that fail to be read text reader due to improper encoding will be skipped.
Fixed an issue where single column text files with no column delimiters would be imported as empty tables.
2022.11.6 Date inference fix
2022.11.5 Fixed negative slicing issues
2022.11.4 Transpose API changes:
table.transpose(...) was renamed to table.pivot_transpose(...)
new table.transpose() and table.T were added, it's functionality acts similarly to numpy.T, the column headers are used the first row in the table when transposing.
2022.11.3 Bugfix for non-ascii encoded strings during t.add_rows(...)
2022.11.2 As utf-8 is ascii compatible, the file reader utils selects utf-8 instead of ascii as a default.
2022.11.1 bugfix in datatypes.infer() where 1 was inferred as int, not float.
2022.11.0 New table features:
Table.diff(other, columns=...),
table.remove_duplicates_rows(),
table.drop_na(*arg),
table.replace(target,replacement),
table.imputation(sources, targets, methods=...),
table.to_pandas() and Table.from_pandas(pd.DataFrame),
table.to_dict(columns, slice),
Table.from_dict(),
table.transpose(columns, keep, ...),
New column features:
Column.count(item),
Column[:] is guaranteed to return a python list.
Column.to_numpy(slice) returns np.ndarray.
new tools library: from tablite import tools with:
date_range(start,end),
xround(value, multiple, up=None), and,
guess as short-cut for Datatypes.guess(...).
bugfixes:
__eq__ was updated but missed __ne__.
in operator in filter would crash if datatypes were not strings.
2022.10.11 filter now accepts any expression (str) that can be compiled by pythons compiler
2022.10.11 Bugfix for .any and .all. The code now executes much faster
2022.10.10 Bugfix for Table.import_file: import_as has been removed from keywords.
2022.10.10 All Table functions now have tqdm progressbar.
2022.10.10 More robust calculation for task size for multiprocessing.
2022.10.10 Dependency update: mplite==1.2.0 is now required.
2022.10.9 Bugfix for Table.import_file:
files with duplicate header names would only have last duplicate name imported.
Now the headers are made unique using name_x where x is a number.
2022.10.8 Bugfix for groupby:
Where keys are empty error should have been raised.
Where there are no functions, unique keypairs are returned.
2022.10.7 Bugfix for Column.statistics() for an empty column
2022.10.6 Bugfix for __setitem__: tbl['a'] = [] is now seen as tbl.add_column('a')
Bugfix for __getitem__: calling a missing key raises keyerror.
2022.10.5 Bugfix for summary statistics.
2022.10.4 Bugfix for join shortcut.
2022.10.3 Bugfix for DataTypes where bool was evaluated wrongly
2022.10.0 Added ability to reindex in table.reindex(index=[0,1...,n,n-1])
2022.9.0 Added ability to store python objects (example).
Added warning when user iterates over non-rectangular dataset.
2022.8.0 Added table.export(path) which exports tablite Tables to file format given by the file extension. For example my_table.export('example.xlsx').
supported formats are: json, html, xlsx, xls, csv, tsv, txt, ods and sql.
2022.7.8 Added ability to forward tqdm progressbar into Table.import_file(..., tqdm=your_tqdm), so that Jupyter notebook can use it in display-methods.
2022.7.7 Added method Table.to_sql() for export to ANSI-92 SQL engines
Bugfix on to_json for timedelta.
Jupyter notebook provides nice view using Table._repr_html_()
JS-users can use .as_json_serializable where suitable.
2022.7.6 get_headers now takes argument (path, linecount=10)
2022.7.5 added helper Table.as_json_serializable as Jupyterkernel compat.
2022.7.4 adder helper Table.to_dict, and updated Table.to_json
2022.7.3 table.to_json now takes kwargs: row_count, columns, slice_, start_on
2022.7.2 documentation update.
2022.7.1 minor bugfix.
2022.7.0 BREAKING CHANGES
- Tablite now uses HDF5 as backend.
- Has multiprocessing enabled by default.
- Is 20x faster.
- Completely new API.
2022.6.0 DataTypes.guess([list of strings]) returns the best matching python datatype.