-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering Subcells of CSV #17
Comments
@reenberg, a few more thoughts:
I have another feature I had in mind (that generates plots from CSV) that share the same problem with this feature (i.e. syntax seems to complicated and more like programming than being markdown-ish). |
Another reason this should be treated specially is that pantable and pantable2csv are designed to be "invertible" (or strictly speaking, idempotent as far as pandoc's being idempotent). If you have a subcells filter then by definition it is not invertible. I now have a better idea of how to group this: under an "extension" key, and add a module with name extension, where all such functions resides in. These functions would then somehow manipulate the CSV is a certain way (i.e. these functions are filters of the CSV). In this way, the regex and the "column-selection" filter should be 2 separate extension/filter functions. Refactoring of the original code is necessary though. It was originally designed as a single file filter. But it becomes more sophisticated that a better organization is necessary. See #9 |
@ickc, I see your point of the reversability of pantable and pantable2csv. I just thought that the reverse was a pure gimmick or at least I didn't see any practical use cases for it as a filter. It produces a .csv file as side effect of running It is true that I could have some part of my build system that generates the desired files as intermediate. However I was not aware of any tool that did what i needed, and implementing this feature my self seemed simpler to add, to the filter, instead of making the boiler plate code my self and creating a script or something else. I like the idea of extensions, however it seems to be a lot of work to implement (different hooking points, etc). |
As a matter of fact without this feature pantable would be useless for me ;) I might have explained why this is useful in the README / pandoc-discuss. I probably will further explore the idea of extensions here. Ideally an extension can be specified without committing into pantable (just like how pandoc filters work independent of pandoc). |
Also see #21. |
The reason I reference #21 is because one of the point there address that inevitably when this feature is added, performance can becomes a big problem, because there's no practical limit on the size of the CSV. |
I think the idea of extensions is a great idea, and I completely see the point of trying to keep the code base as clean and simple as possible. However it seems like a big and messy thing to introduce, for just a few extensions? Anyways perhaps something like this could be of interest? https://github.com/tarekziade/extensions It is old and has apparently been moved to github as of lately. |
I will think about that a bit more. To explain why extension is a good idea abeit seems overkill for now: it is because I have a lot of things in my mind that I want to do with pantable. pantable and pantable2csv are decided to make csv table "almost" a first class table syntax in pandoc (pandoc offered 4 syntaxes for tables, there were discussions to add csv syntax as an official pandoc table extension, but it doesn't pass the "markdown-ish" criteria for Anyway, pantable and pantable2csv are designed with these in mind, so that more careful thoughts are given to syntaxes and its behavior to mimic a pandoc experience (e.g. you almost cannot trigger a Python exception, perhaps except for a malformed yaml syntax. warning messages are given and it will try its best to proceed). But there are other things lesser "careful thoughts" should be given to allow for innovations, such as this. If you think about it, it is how pandoc behaves. Official features are almost always well-thought but takes tons of time to be immeplemented. By allowing extensions/filters, less rigorous process can be allowed while keeping the upstream clean. |
In #21, I mentioned a solution to solve the memory consumption of arbitrary large input CSV size. The solution is simple, it should be lazy evaluated (using iterator rather than turning everything into list). But the downside is, either the code somewhat need to be completely rewritten (although probably won't be too much different?), or functionality like this cannot be (at least very difficult to be) an external filter, because it has to happen when reading from the CSV. |
See #16
I've seen such feature request for other pandoc filters dealing with csv before, but I didn't give enough time to think about a good syntax for it (and since I personally don't need this feature yet, I didn't give it a priority). And from the example given, the syntax is not very intuitive.
Although pantable is a 3rd party pandoc filter rather than native pandoc syntax (because
@jgm
doesn't think CSV format is markdown-ish enough (i.e. can be read as plain text), I still want it to be as markdown-ish as possible.The text was updated successfully, but these errors were encountered: