Skip to content

Releases: dathere/qsv

0.108.0

25 Jun 17:08
ac27d40
Compare
Choose a tag to compare

Another big Quicksilver release with lots of new features and improvements!

The two Polars-powered commands - joinp and sqlp - have received significant attention. joinp now supports asof joins and the --try-parsedates option. sqlp now has several Parquet format options, along with a --low-memory option.

Other new features include:

  • A new cat rowskey --group option that emulates csvkit's csvstack command.
  • SIMD-accelerated UTF-8 validation for the input command.
  • A --field-separator option for the flatten command.
  • The sniff command now uses the excellent file-format crate for mime-type detection on ALL platforms, not just Linux, as was the case when we were using the libmagic library.

Also, QuickSilver now has optimized builds for Apple Silicon. These builds are created using native Apple Silicon self-hosted Action Runners, which means we can enable all qsv features without being constrained by cross-compilation limitations and GitHub’s Action Runner’s disk/memory constraints. Additionally, we compile Apple Silicon builds with M1/M2 chip optimizations enabled to maximize performance.

Finally, qsv startup should be noticeably faster, thanks to @vi’s PR to avoid sysinfo::System::new_all.

Added

  • joinp: added asof join & --try-parsedates option #1059
  • cat: emulate csvkit's csvstack #1067
  • input: SIMD-accelerated utf8 validation 88e1df2
  • sniff: replace magic with file-format crate, enabling mime-type detection on all platforms #1069
  • sqlp: add --low-memory option d95048e
  • sqlp: added parquet format options c179cf4 a861ebf
  • flatten: add --field-separator option #1068
  • Apple Silicon binaries built on native Apple Silicon self-hosted Action Runners, enabling all features and optimized for M1/M2 chips

Changed

  • input: minor improvements 62cff74
  • joinp: align option names with join command #1058
  • sqlp: minor improvements
  • changed all GitHub action workflows to account for the new Apple Silicon builds
  • Bump rust_decimal from 1.29.1 to 1.30.0 by @dependabot in #1049
  • Bump serde_json from 1.0.96 to 1.0.97 by @dependabot in #1051
  • Bump calamine from 0.21.0 to 0.21.1 by @dependabot in #1052
  • Bump strum from 0.24.1 to 0.25.0 by @dependabot in #1055
  • Bump actix-governor from 0.4.0 to 0.4.1 by @dependabot in #1060
  • Bump csvs_convert from 0.8.5 to 0.8.6 by @dependabot in #1061
  • Bump itertools from 0.10.5 to 0.11.0 by @dependabot in #1062
  • Bump serde_json from 1.0.97 to 1.0.99 by @dependabot in #1065
  • Bump indexmap from 1.9.3 to 2.0.0 by @dependabot in #1066
  • Bump calamine from 0.21.1 to 0.21.2 by @dependabot in #1071
  • cargo update bump various indirect dependencies
  • pin Rust nightly to 2021-06-23

Fixed

  • Avoid sysinfo::System::new_all by @vi in #1064
  • correct typos project-wide #1072

Removed

  • removed libmagic dependency from all GitHub action workflows

New Contributors

  • @vi made their first contribution in #1064

Full Changelog: 0.107.0...0.108.0

0.107.0

15 Jun 00:55
d2b55fc
Compare
Choose a tag to compare

We continue to improve the new sqlp command. It now supports SQL scripts and additional options to fine-tune Polars CSV parsing and formatting behavior.

We also added an _all_generic special value for the rename command which allows you to rename all columns in a CSV with generic names (e.g. _col_1, _col_2, _col_N). This was done to make it easier to prepare CSVs with no headers for use with sqlp.

This release also features a Windows MSI installer. This is a big step forward for qsv and we hope to make it easier for Windows users to install and use qsv. Thanks @minhajuddin2510 for all the work on pulling this together!

Added

  • sqlp: added script support #1037
  • sqlp: added CSV format options #1048
  • rename: add "_all_generic" special value for headers #1031

Changed

Fixed

New Contributors

Full Changelog: 0.106.0...0.107.0

0.106.0

07 Jun 17:36
Compare
Choose a tag to compare

This release features the new Polars-powered sqlp command which allows you to run SQL queries against CSVs.

Initial tests show that its performance is competitive with DuckDB and faster than DataFusion on identical SQL queries, and it just runs rings around pandas sql.

It converts Polars SQL (a subset of ANSI SQL) queries to multi-threaded LazyFrames expressions and then executes them. This is a very powerful feature and allows you to do things like joins, aggregations, group bys, etc. on larger than memory CSVs. The sqlp command is still experimental and we are looking for feedback on it. Please try it out and let us know what you think.

Added

  • sqlp: new command to allow Polars SQL queries against CSVs #1015

Changed

Full Changelog: 0.105.1...0.106.0

0.105.1

31 May 00:44
Compare
Choose a tag to compare

All "unsafe" code has been removed. By selectively using asserts, we obviate the need to use explicit unchecked logic to skip unnecessary bounds checking.

Changed

  • stats: remove all unsafes 4a4c010
  • fetch & fetchpost: remove unsafe 1826bb3
  • validate: remove unsafe 742ccb3
  • normalize --user-agent option across all of qsv feff90b & 839b3b7
  • bump qsv-dateparser from 0.8.1 to 0.8.2 which also uses chrono 0.4.26
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2023-05-29

Fixed

  • remove chrono pin to 0.4.24 and upgrade to 0.4.26 which fixed 0.4.25 CI test failures 7636d82

Full Changelog: 0.105.0...0.105.1

0.105.0

30 May 12:08
Compare
Choose a tag to compare

Added

Changed

  • sniff: if --no-infer is enabled when sniffing a snappy file, just return the snappy mime type #996
  • sniff: now returns filesize and last-modified date in errors. 2162659
  • stats: minor performance tweaks in hot compute loop f61198c
  • qsv binary variants built using older glibc/musl libraries are now published with their respective glibc/musl version suffixes (glibc-2.31/musl-1.1.24) in the filename, instead of just the "older" suffix.
  • pin chrono to 0.4.24 as the new 0.4.25 is breaking CI tests cde3623
  • Bump calamine from 0.19.1 to 0.20.0 ec7e2df
  • Bump actions/setup-python from 4.6.0 to 4.6.1 by @dependabot in #991
  • Bump flexi_logger from 0.25.4 to 0.25.5 by @dependabot in #992
  • Bump regex from 1.8.2 to 1.8.3 by @dependabot in #993
  • Bump csvs_convert from 0.8.3 to 0.8.4 by @dependabot in #994
  • Bump log from 0.4.17 to 0.4.18 by @dependabot in #998
  • Bump polars from 0.29.0 to 0.30.0 by @dependabot in #999
  • Bump tokio from 1.28.1 to 1.28.2 by @dependabot in #1000
  • Bump once_cell from 1.17.1 to 1.17.2 by @dependabot in #1003
  • Bump indicatif from 0.17.3 to 0.17.4 by @dependabot in #1001
  • cargo bump update several indirect dependencies
  • pin Rust nightly to 2023-05-28

Removed

  • excel: removed kludgy --dates-whitelist option #1005

Fixed

  • sniff: fix inconsistent mime type detection #995

Full Changelog: 0.104.1...0.105.0

0.104.1

23 May 13:20
Compare
Choose a tag to compare

Added

  • added new publishing workflow to build binary variants using older glibc 2.31 instead of glibc 2.35 and musl 1.1.24 instead of musl 1.2.2. This will allow users running on older Linux distros (e.g. Debian, Ubuntu 20.04) to run qsv prebuilt binaries with "older" glibc/musl versions. 1a08b92

Changed

  • sniff: improved usage text d2b32ac
  • sniff: if sniffing a URL, and server does not return content-length or last-modified headers, set filesize and last-modified to "Unknown" d4a64ac
  • frequency: use SIMD accelerated utf8 validation in hot loop 33406a1
  • foreach: use simdut8 validation df6b4f8
  • apply: use simdutf8 validation in decode operation; also tweak it to avoid panics (however unlikely) adf7052
  • update install & build instructions with magic
  • Bump regex from 1.8.1 to 1.8.2 by @dependabot in #990
  • Bump bumpalo from 3.12.2 to 3.13.0
  • pin Rust nightly to 2021-05-22

Removed

  • sniff: disabled --progressbar option on qsvdp binary variant 1a20edb

Fixed

  • updated publishing workflows to properly enable magic feature (for sniff mime type detection) 136211f

Full Changelog: 0.104.0...0.104.1

0.104.0

22 May 13:26
Compare
Choose a tag to compare

Added

  • sniff: add --no-infer option only available on Linux. Using this option makes sniff work as a general mime type detector - retrieving detected mime type, file size (content-length when sniffing a URL), and last modified date.
    When sniffing a URL with --no-infer, it only sniffs the first downloaded chunk, making it very fast even for very large remote files. This option was designed to facilitate accelerated harvesting and broken/stale link checking on CKAN. #987
  • excel: add canonical_filename to metadata #985
  • snappy: now accepts url input #986
  • sample: support url input #989

Changed

  • Bump qsv-sniffer from 0.9.2 to 0.9.3 by @dependabot in #979
  • Bump console from 0.15.5 to 0.15.6 by @dependabot in #980
  • Bump jql-runner from 6.0.7 to 6.0.8 by @dependabot in #981
  • Bump console from 0.15.6 to 0.15.7 by @dependabot in #988
  • Bump embedded Luau from 0.576 to 0.577
  • apply select clippy recommendations
  • tweaked emojis used in Available Commands legend - πŸ—œοΈ to 🀯 to denote memory-intensive commands that load the entire CSV into memory; πŸͺ— to 😣 to denote commands that need addl memory proportional to the cardinality of the columns being processed; 🌐 to denote commands that have web-aware options
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2021-05-21

Fixed

Full Changelog: 0.103.1...0.104.0

0.103.1

17 May 09:01
Compare
Choose a tag to compare

Changed

  • Bump reqwest from 0.11.17 to 0.11.18 by @dependabot in #978
  • cargo update bump indirect dependencies

Fixed

  • fix cargo install failing as it is trying to fetch cargo environment variables that are only set for cargo build, but not cargo install #977

Full Changelog: 0.103.0...0.103.1

0.103.0

15 May 09:59
Compare
Choose a tag to compare

Added

  • sniff: On Linux, short-circuit sniffing a remote file when we already know its not a CSV #976
  • stats: now computes variance for dates e3e6782
  • stats: now automatically invalidates cached stats across qsv releases 6e929dd
  • add magic version to --version option 455c0f2
  • added CKAN-aware (CKAN) legend to List of Available Commands

Changed

  • stats: improve usage text
  • stats: use extend_from_slice for readability 23275e2
  • validate: do not panic if the input is not UTF-8 532cd01
  • sniff: simplify getting stdin last_modified property; on Linux, return detected mime type in JSON error response 0197591
  • luau: update embedded Luau from 0.573 to 0.576
  • Update nightly build instructions
  • Bump qsv-sniffer from 0.9.1 to 0.9.2 by @dependabot in #972
  • Bump tokio from 1.28.0 to 1.28.1 by @dependabot in #973
  • Bump serde from 1.0.162 to 1.0.163 by @dependabot in #974
  • cargo update bump several indirect dependencies
  • pin Rust nightly to 2021-05-13

Full Changelog: 0.102.1...0.103.0

0.102.1

09 May 10:02
Compare
Choose a tag to compare

0.102.1 is a small patch release to fix issues in publishing the pre-built binary variants with magic for sniff when cross-compiling.

Changed

  • stats: refine --infer-boolean option info & update test count de6390b
  • tojsonl: refine boolcheck_first_lower_char() fn 241115e

Fixed

  • tweaked GitHub Actions publishing workflows to enable building magic-enabled sniff on Linux. Disabled magic when cross-compiling for non-x86_64 Linux targets.

Full Changelog: 0.102.0...0.102.1