Feature: Implicitly generate null value column when using `COPY` #4305

prrao87 · 2024-09-26T02:50:32Z

API

Other

Description

I have this scenario where I have a CSV/Parquet file with just two columns:

product   price
Laptop 1100.0
Mouse 150.0
Headphones 250.0

In my DDL, I want to add a nullable column historical_price as follows:

CREATE NODE TABLE IF NOT EXISTS Product(name STRING, price DOUBLE, historical_sales INT32, PRIMARY KEY (name));

I want to initialize the node table with all values in the historical_sales columns as nulls, and add them via MERGE at a later time when they become available.

Issue

Because my input file has just 2 columns, and my DDL specifies 3 columns, I cannot use COPY Product FROM 'product.parquet' directly.

I instead have to do this:

COPY Product(name, price) FROM 'data/product.parquet'

Feature request

In the COPY pipeline, because we already know the number of columns in the input prior to importing the data, can we implicitly infer the column names so that we can use the much simpler DDL command below? It would reduce the mental burden on the user, as it's expected that the columns that are absent in the input file would have to be filled with nulls.

COPY Product FROM 'data/product.parquet'

@ray6080 I think this seems like a reasonable feature, but if you think it's infeasible, feel free to close.

The text was updated successfully, but these errors were encountered:

ray6080 · 2024-09-27T15:43:19Z

In the COPY pipeline, because we already know the number of columns in the input prior to importing the data, can we implicitly infer the column names so that we can use the much simpler DDL command below?

I think it can introduce some confusing behaviours if we implicitly infer column names, such as what if column names don't exactly match? or when there are no header information in the source? In that sense, COPY Product(name, price) FROM 'source' is much less prone to confusions I think.
But let's do more survey on this and see how other systems handle such cases before we jump to a conclusion.

prrao87 added the feature New features or missing components of existing features label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Implicitly generate null value column when using `COPY` #4305

Feature: Implicitly generate null value column when using `COPY` #4305

prrao87 commented Sep 26, 2024

ray6080 commented Sep 27, 2024

Feature: Implicitly generate null value column when using COPY #4305

Feature: Implicitly generate null value column when using COPY #4305

Comments

prrao87 commented Sep 26, 2024

API

Description

Issue

Feature request

ray6080 commented Sep 27, 2024

Feature: Implicitly generate null value column when using `COPY` #4305

Feature: Implicitly generate null value column when using `COPY` #4305