Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying too many objects runs into maxQueryStringLength for Sdss.query_region -- fix vectorization of queries #2211

Closed
Nestak2 opened this issue Nov 16, 2021 · 6 comments

Comments

@Nestak2
Copy link

Nestak2 commented Nov 16, 2021

I am trying to extract spectra from SDSS with astroquery. Before I get to this step I encounter a problem - the following throws an error:

from astroquery.sdss import SDSS
from astropy import coordinates as coords

query = "select top 100                        z, ra, dec, bestObjID                      from                        specObj                      where                        class = 'galaxy'                        and programname = 'eboss'"
res = SDSS.query_sql(query)
co = coords.SkyCoord(ra=res['ra'], dec=res['dec'], unit='deg')
result = SDSS.query_region(co, spectro=True)

The error is:

ValueError: Some errors were detected !
    Line #12 (got 3 columns instead of 1)
    Line #32 (got 5 columns instead of 1)

Strangely enough, if in the query top 100 is replaced with top 10 everything is fine. So there is a problem with only a part of the objects. I thought the problem might be just limited to eBoss entries in the database, so tried substituting programname = 'eboss' with instrument='SDSS' and the problem appeared even then. Is there a solution or some kind of a workaround to this? Thank you!

@keflavich keflavich added the sdss label Nov 16, 2021
@bsipocz
Copy link
Member

bsipocz commented Nov 16, 2021

The error coming back from the server is: The length of the query string for this request exceeds the configured maxQueryStringLength value., and indeed the query string is too long for a GET query.

Unfortunately, a workaround, for now, is to loop through much fewer objects at a time (going by ~20 should work I think, the query URL cannot be longer than 2048 characters, and just the SQL part of yours is ~8200 atm).

@weaverba137
Copy link
Member

@keflavich asked the SDSS help desk to take a look at this, but I agree with @bsipocz that this should be addressed by making the error message more obvious. I don't think it is going to be possible for SDSS to change the query string length on the server side. There are security considerations if nothing else.

@bsipocz
Copy link
Member

bsipocz commented Nov 16, 2021

I don't think it is going to be possible for SDSS to change the query string length on the server-side.

Ahh, right, this is server-side and not just the GET/POST limit. In that case, I suppose uploading a user catalogue to the x-match service to be used in the SQL service could possibly be the solution to deal with many objects at once.

@Nestak2 - in your use case, I suppose it is possible to rewrite the original SQL to merge the two queries into one SQL command and use query_sql only rather than feed the results into query_region. (you can use the get_query_payload kwarg to see what is currently being queried. Joining the two queries would help you get rid of listing all the coordinate pairs, thus the length of the full string would be significantly reduced).

@bsipocz bsipocz changed the title "ValueError: Some errors were detected !" for SDSS.query_region for some objects Querying too many objects runs into maxQueryStringLength for Sdss.query_region -- fix vectorization of queries Nov 16, 2021
@Nestak2
Copy link
Author

Nestak2 commented Nov 17, 2021

@bsipocz Thank you for the answer and the useful suggestion! I am now trying to implement your suggestion of merging the 2 queries into 1. Nevertheless, I am very new to astroquerry and I would greatly appreciate it, if you could give me a minimal working example of how my queries could be merged. For now I can't figure it out. My final intention is to add a few more lines to the above code and extract the spectra of ~20k-100k objects.

I tried to do this without query_region from above like this:

query = "select top 1                        z, ra, dec, bestObjID                      from                        specObj                      where                        class = 'galaxy'                        and programname='eboss'"
res = SDSS.query_sql(query)
co = coords.SkyCoord(ra=res['ra'], dec=res['dec'], unit='deg')
table_spec = SDSS.get_spectra(co)

But this then throws the error

HTTPError: HTTP Error 404: Not Found

Any idea how this could be fixed?

Also you wrote that you can see the "error coming back from the server". How is it available, how can I look it up? Tnx

This was referenced Feb 23, 2022
@bsipocz
Copy link
Member

bsipocz commented Feb 23, 2022

The logic in #2305 should work for the other methods, too, however I think the GETs should be rewritten for POST queries, and also the SQL code generating method should be refactored. Also, we should really be using the HTM API rather than generating our own, naive, rectangles, I would expect that would also cut back a lot on the actual length of the query string, even without doing the GET/POST changes (but of course it would not be a solution for querying for 20-100k objects).

@bsipocz
Copy link
Member

bsipocz commented Sep 11, 2022

This issue seems to be fixed by #2477, so I'm closing. Please feel free to reopen if you still run into it.

@bsipocz bsipocz closed this as completed Sep 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants