Releases · OndraZizka/csv-cruncher

03 Dec 00:27

csv-cruncher-2.10.1

31e5e2c

2.10.1 - Fix the `crunch` script to point to the new name of the fat jar Latest

Latest

The changes in how Maven Central accepts artifacts and changes in the release plugins resulted in the fat jar being renamed from ...-single.jar to ...-fatjar.jar. I forgot to reflect that in a couple of previous releases, so this one fixes it.

Assets 3

02 Dec 21:05

OndraZizka

csv-cruncher-2.9.0

2ee89fd

2.9.0 - File type detection

Now, the input without a suffix will be checked (the first non-blank line for CSV, the first few characters for JSON),
and the type will be guessed accordingly.
May not work 100 %, but should cover most cases.

The dist.zip is available in Maven central.

Assets 2

02 Dec 21:17

OndraZizka

csv-cruncher-2.10.0

96be720

2.10.0 - SQL functions for JSON extraction

A couple of SQL functions are now added:

jsonSubtree(path, json) - Returns a json subtree (as JSON) at a given slash-separated path (foo/bar). Arrays not supported, but could be added.
jsonLeaf(path, json) - Like above, but expects the node to be a scalar, and returns the raw value rather than JSON serialization of it.
jsonLeaves(pathToArray LONGVARCHAR, leavesSubpath LONGVARCHAR, json LONGVARCHAR, nullOnNonArray BOOLEAN) - returns the leaves form an array, extracted from the given subpath (of each item in that array). Returns it serialized to JSON - due to limitations of HSQLDB. Expects the leaves to be scalar.
jsonSubtrees(pathToArray, subpath, json) - Not implemented. It would do the same as jsonLeaves(), except it would put the sub-nodes (rather than only scalars) to an array of subtrees. Let me know if you need it.

The reason why jsonSubtrees is missing is that originally, jsonLeaves() was supposed to return a SQL type ARRAY, but that is not supported by HSQLDB.

Assets 4

02 Dec 00:32

OndraZizka

csv-cruncher-2.8.0

370c090

2.8.0 - UX improvements

Several things were improved for user experience.
Mainly, less debug info is printed to stdout.

Full Changelog: csv-cruncher-2.7.1...csv-cruncher-2.8.0

Assets 3

01 Dec 06:26

OndraZizka

csv-cruncher-2.7.1

8fee0dc

2.7.1 - Fixes

Fixes:

#151 Reliably delete the HSQLDB dir on exit.
#152 Backup file is moved into workdir rather than next to the original file.
#153 Output to STDOUT causes a file named - to be created in workdir.
Upgrades of most of dependencies and Maven plugins
Kotlin 2.1

The files should also eventually appear in Maven Central. https://mvnrepository.com/artifact/ch.zizka.csvcruncher/csv-cruncher

Assets 4

03 Sep 15:29

OndraZizka

csv-cruncher-2.7.0

261f773

2.7.0 Print the SQL query results to STDOUT, implement --logLevel

For any CLI tool, it's a bit weird not to allow printing the output to the standard output. CSV Cruncher is no exception.

Now, the -out option may take - (minus) as a value, directing the CSV data (SQL query result) to STDOUT:

crunch -in chatgpt-alternatives.csv -out -  -sql 'SELECT name FROM chatgpt-alternatives WHERE monthlyPrice < 10'

For now, the output is mixed with the logging output, since CSV Cruncher is still freshly out of prototype phase (after 13 years :) )

However, that can be neglected by turning off the logging:

crunch --logLevel OFF ...

Setting the log level should have worked since 2.4.0, but somehow, the implementation slipped out of that version.

Both features will be improved in the further versions, as poor UX has been identified as the main hindrance of user adoption.

Assets 6

29 Jun 00:10

OndraZizka

csv-cruncher-2.6.0

493d286

2.6.0: Custom table names for inputs - `-in ... -as ...`

Before 2.6.0, the tables were named after the input file name.

crunch -in SomeVeryLongName1234567890.csv -out output.csv -sql "SELECT * FROM SomeVeryLongName1234567890"

As of 2.6.0, it is possible to set the table name using -as.

crunch -in SomeVeryLongName1234567890.csv -as data -out output.csv -sql "SELECT * FROM data"

It is also now possible to import a file twice, if that's needed (although that should rather be done by a self-JOIN).

crunch -in data.csv -as data1 -in data.csv -as data2 -out output.csv -sql "SELECT * FROM data1 UNION SELECT * FROM data2"

Assets 4

27 Jun 21:46

OndraZizka

csv-cruncher-2.5.0

8d8afe7

2.5.0: Allow indexes

As of 2.5.0, the imported CSV may be covered by an index by the underlying HSQLDB. This speeds up joins across large data sets significantly. Example: 10.000 x 10.000 rows join took around 30 minutes. With the indexes it is within seconds.

Usage example:

invoices.csv

# id, whenSend, totalAmount, ...
1001, ...

invoiceLines.csv

# id, invoice_id, description, unit, qty, unit_price, amount, ...
20002, 1001, ...

./crunch \
   -in invoices.csv -indexed id \
   -in invoiceLines.csv -indexed id,invoice_id \
   -sql "SELECT invoices AS i LEFT JOIN invoiceLines AS il ON (il.invoice_id = i.id)"
   -out joined.csv

With the added indexes, such query will now execute much faster.

For now, the indexes need to be added using -indexed.

Later on, this could happen automatically for the columns appearing in JOIN, WHERE, and GROUP BY clauses.

This release is also available from Maven Central.

Assets 4