Skip to content

Commit

Permalink
minor docs changes
Browse files Browse the repository at this point in the history
  • Loading branch information
drizk1 committed Aug 1, 2024
1 parent 210f315 commit 569fc2e
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 21 deletions.
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
- adds support for reading from multiple files at once as a vector of paths in `db_table` when using DuckDB
- ie `db_table(db, ["path1", "path2"])`
- adds streaming support when using DuckDB with `@collect(stream = true)`
- allows user to customize file reading via `db_table(db, "read_*(path, args)")` when using DuckDB"
- allows user to customize file reading via `db_table(db, "read_*(path, args)")` when using DuckDB

## v0.3.0 - 2024-07-25
- Introduces package extensions for:
Expand Down
12 changes: 2 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,8 @@ From TidierDates.jl:
Supported aggregate functions (as supported by the backend) with more to come
- `mean`, `minimium`, `maximum`, `std`, `sum`, `cumsum`, `cor`, `cov`, `var`
- `@summarize` supports any SQL aggregate function in addition to the list above. Simply write the function as written in SQL syntax and it will work.
- `copy_to` (for DuckDB, MySQL, SQLite)

With DuckDB, `db_table` supports direct paths for S3 bucket locations, iceberg tables, delta table, in addition to csv, parquet, etc.

DuckDB specifically enables copy_to to directly reading in `.parquet`, `.json`, `.csv`, and `.arrow` file, including https file paths.

```julia
path = "file_path.parquet"
copy_to(conn, file_path, "table_name")
```
When using the DuckDB backend, if `db_table` recieves a file path (`.parquet`, `.json`, `.csv`, `iceberg` or `delta`), it does not copy it into memory. This allows for queries on files too big for memory. `db_table` also supports S3 bucket locations via DuckDB.

## What is the recommended way to use TidierDB?

Expand All @@ -95,7 +87,7 @@ Even though the code reads similarly to TidierData, note that no computational w
using TidierData
import TidierDB as DB

db = DB.connect(duckdb());
db = DB.connect(DB.duckdb());
path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"

@chain DB.db_table(db, path_or_name) begin
Expand Down
11 changes: 2 additions & 9 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,16 +64,9 @@ From TidierDates.jl:
Supported aggregate functions (as supported by the backend) with more to come
- `mean`, `minimium`, `maximum`, `std`, `sum`, `cumsum`, `cor`, `cov`, `var`
- `@summarize` supports any SQL aggregate function in addition to the list above. Simply write the function as written in SQL syntax and it will work
- `copy_to` (for DuckDB, MySQL, SQLite)

With DuckDB, `db_table` supports direct paths for S3 bucket locations, iceberg tables, delta table, in addition to csv, parquet, etc.
When using the DuckDB backend, if `db_table` recieves a file path ( `.parquet`, `.json`, `.csv`, `iceberg` or `delta`), it does not copy it into memory. This allows for queries on files too big for memory. `db_table` also supports S3 bucket locations via DuckDB.

DuckDB specifically enables copy_to to directly reading in `.parquet`, `.json`, `.csv`, and `.arrow` file, including https file paths.

```julia
path = "file_path.parquet"
copy_to(conn, file_path, "table_name")
```

## What is the recommended way to use TidierDB?

Expand All @@ -89,7 +82,7 @@ Even though the code reads similarly to TidierData, note that no computational w
using TidierData
import TidierDB as DB

db = DB.connect(duckdb());
db = DB.connect(DB.duckdb());
path_or_name = "https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"

@chain DB.db_table(db, path_or_name) begin
Expand Down
2 changes: 1 addition & 1 deletion src/olympics_examples_fromweb.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ using TidierData

con = duckdb_connect("memory"); # opening DuckDB database connection.
db = duckdb_open("con");
copy_to(db, "https://raw.githubusercontent.com/rahkum96/Olympics-Dataset-by-sql/main/noc_regions.csv" "noc_regions")
copy_to(db, "https://raw.githubusercontent.com/rahkum96/Olympics-Dataset-by-sql/main/noc_regions.csv", "noc_regions")
copy_to(db, "https://raw.githubusercontent.com/rahkum96/Olympics-Dataset-by-sql/main/olympic_event.csv", "athlete_events")


Expand Down

0 comments on commit 569fc2e

Please sign in to comment.