-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
columns parameter not working as expected in parse_single_table/parse #14943
Comments
Welcome to Astropy 👋 and thank you for your first issue! A project member will respond to you as soon as possible; in the meantime, please double-check the guidelines for submitting issues and make sure you've provided the requested details. GitHub issues in the Astropy repository are used to track bug reports and feature requests; If your issue poses a question about how to use Astropy, please instead raise your question in the Astropy Discourse user forum and close this issue. If you feel that this issue has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org. |
I cannot tell what was the original intent in the design but this behavior seems intentional in the code. The astropy/astropy/io/votable/tree.py Line 2811 in 196e3c2
that is used here: astropy/astropy/io/votable/tree.py Line 2832 in 196e3c2
that populates this: astropy/astropy/io/votable/tree.py Lines 2867 to 2868 in 196e3c2
I don't see the code attempting to drop any columns. Are you trying to load a larger-than-memory table? If not, maybe consider astropy.table.Table unified I/O interface and then you can manipulate the columns as you could any astropy.table.Table. If unit matters, then consider |
I see. But actually, instead of dropping the columns, I was expecting the code to simply skip them. I want to avoid reading columns that I know I'm not going to use. I may have to use this function to read files of a few GB, so I'll see whether the reading time improves when passing a list of only a few columns to I still find the description in the docstring misleading, though. |
I'll ask PyVO people who use this format a lot. Perhaps they would have better insight. If you have a proposal on how to improve the docstring, please open a pull request. Thanks! |
I wasn't aware of the It does look like the code is skipping parsing and saving the values for those columns, which is about all it could do for read-time efficiency. The individual values still need to be parsed in the XML sense so the reader can get to the next value, but the values are being ignored. However, if it created the output table without the undesired columns, that could free up useful space for output tables (I don't know how much space is saved by not storing any values in those columns). |
I haven't checked in terms of space, but I can confirm that the function runs faster when only passing a few columns. I have an XML file containing a table of 5,000 rows and 26 fields/columns.
took whereas
took |
Ah thanks, that's good to know! With that time savings, it's definitely worth doing. I am a bit concerned with the general slow performance those numbers show, so I'm thinking we should be raising the priority of issue #8191 (votable read slow). |
It seems to be rather consensual that the current behaviour is unexpected. To address @pllim's interrogations, I started studying the |
I am actively maintaining and developing a Rust VOTable parser available on crates.io. (@neutrinoceros, you are learning Rust according to your github page ;) ). Don't hesitate to contact me (or @ManonMarchand which is located in the office next to mine). |
@fxpineau That is awesome, thank you for sharing ! And yes, I'm learning rust, but haven't gone beyond the basics yet, so the project is probably more interesting to me than I am to it as a potential contributor (for now) !
That said, it's still helpful to know about another implementation that's closer to the metal so we can better set our expectations of what to aim for in terms of performance ! |
Ok. Just in case, we have some experience of mixed Rust/Python packages with MOCPy which is an astropy affiliated package available in both pypi and conda, see mocpy feedstock recipe here . It is largely based on the Rust moc crate. Same for cds-healpix-python, which is not astropy affiliated, but is both in pypi and conda, and is based on the rust cdshealpix crate. (We also made the tool get-license-healper on purpose to help writing mixed Python/Rust projects conda recipes.) Also, I have a permanent position at CDS so those Rust codes are going to be supported on the long term. |
Very interesting! Is IVOA involved in this effort? Any plan to integrate with PyVO if possible? |
For the record: Two years ago, we talked with @tomdonaldson an some other VO people about taking advantage of the MIVOT implementation to promote a new VOTable parser (e.g. RUST) in PyVO. In the end, it seemed to us preferable to remain focused on the implementation of MIVOT staying within the existing software environment. |
Description
When using
parse_single_table
to read a VOTable in an XML file, the parametercolumns
doesn't work as expected. The parametercolumns
is internally passed to the functionparse
which describescolumns
as:I'm expecting
parse
to have the same issue. I haven't directly tried it, but I've verified thatcolumns
is being passed toparse
appropriately.Expected behavior
When passing a list of columns to
columns
only the columns in the list should be present in the output. However, all the columns are present in the output, but the ones that are not in the list are shown as a masked constant (NaN) or as an empty array depending on the type declared in the file.I'm expecting to get:
But I'm getting:
How to Reproduce
The contents of
sample.xml
are:Versions
astropy version: '6.0.dev278+g196e3c2d2'
The text was updated successfully, but these errors were encountered: