How to describe the table schema of a multi-page spreadsheet? #655
-
What's the expected behaviour of a validator (say goodtables) when parsing a datapackage like: {
"name": "my-data",
"title": "My data",
"resources": [
{
"name": "data",
"path": "data.xls", // Excel file with multiple pages
"schema": { /* Table schema */ }
}
]
} I imagine the tool will validate the first page. Is there a way to define which page to validate? The specific use case I'm thinking is on defining the table schema for goodtables. (cc @pwalsh) |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
While we support excel in the underlying libraries (tabulator), and therefore also in goodtables, excel is not valid for a Tabular Data Package, and so I would not expect this DP to be parsed as a TDP as a given. However, it might due to implementation. In that case, I am not sure. while gt has the ability to select a sheet, TDP has no such declaration, and one might expect it to select the first sheet. |
Beta Was this translation helpful? Give feedback.
-
@vitorbaptista first as @pwalsh says that is not a valid tabular data package. However, this is a natural case and we've already been handling this in DataHub, see this post: http://datahub.io/blog/excel-files-on-the-datahub-automated-previews-and-data-extraction Our approach is to allow schema on Excel resources but have an extra sheet property. @anuveyatsu can probably comment in more detail. |
Beta Was this translation helpful? Give feedback.
-
Is there a place for these "extensions"? Although I understand TDP only supports CSV, spreadsheet formats are a very common use-case. My vision for goodtables is that the TDP and table schema are the way most people will configure the validations, and the missing piece is defining the sheet in a spreadsheet. We can go ahead and add a |
Beta Was this translation helpful? Give feedback.
-
It's possible with Frictionless Framework and I'm going to work on this as a pattern (Table Dialect Spec) for the specs:
The same approach works for SQL databases and other multitable formats. |
Beta Was this translation helpful? Give feedback.
It's possible with Frictionless Framework and I'm going to work on this as a pattern (Table Dialect Spec) for the specs:
The same approach works for SQL da…