Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return keys of dictionary in crystfel.py #213

Open
DHekstra opened this issue May 15, 2023 · 8 comments
Open

Return keys of dictionary in crystfel.py #213

DHekstra opened this issue May 15, 2023 · 8 comments

Comments

@DHekstra
Copy link
Contributor

DHekstra commented May 15, 2023

It would be helpful to optionally return the keys of the dictionary that _parse_stream in io/crystfel.py produces when calling rs.read_crystfel. My use case is working with stream files produced by a new crystfel data processing pipeline at LCLS that contains data from multiple fixed-target crystals. The crystal ID and per-crystal frame numbers are not encoded in the BATCH record produced by rs.read_crystfel but are recoverable from the keys. There's no need to parse the keys any further--just to have the option to return list(d.keys()).

See

d, cell = _parse_stream(streamfile)

@JBGreisman
Copy link
Member

Is this going to be a new, general feature of the crystfel stream file output? If it is a very customized output for the LCLS processing pipeline (and will not make it more broadly into crystfel), I don't know if it makes sense to support.

For very personal/custom use cases, I think it may be best to put together your own implementation rather than modifying the interface here.

@DHekstra
Copy link
Contributor Author

My use case is, of course, specific, but the point is general: _parse_stream() already returns a dictionary with keys and values. I am simply asking for the option for read_crystfel() to also return that dictionary, or at least a list of its keys.

@JBGreisman
Copy link
Member

got it -- I'll think over whether that will make sense. In the meantime, the desired functionality can be obtained this way (this is not officially supported, and is subject to change...):

from reciprocalspaceship.io.crystfel import _parse_stream
d, cell = _parse_stream(streamfile)

@DHekstra
Copy link
Contributor Author

That can work. In that case, could we turn lines 238-268 in crystfel.py into another function that can be called, e.g. dict_to_dataset() or so? That would allow me to write clean code that can both get the DataSet and the dictionary.

@JBGreisman
Copy link
Member

The StreamLoader class provided by #216 to speed up the parsing of stream files may provide the info you want in a cleaner way. I'm not exactly sure what data you want, but you will be able to instantiate a StreamLoader object using something like this:

from reciprocalspaceship.io.crystfel import StreamLoader
loader = StreamLoader(streamfile)              # use to find metadata you seek
ds = loader.to_dataset(spacegroup=spacegroup)  # also get a DataSet
ds.set_index(["H", "K", "L"], inplace=True)

This isn't yet in the main codebase, but you can always test it out using the faster_stream branch of the repo

@kmdalton
Copy link
Member

I will look in to extending the StreamLoader class to provide the info @DHekstra wants. In our offline conversation, we discussed that the image filename was required for this.

@JBGreisman
Copy link
Member

Is this still desirable, or has the update to rs.read_crystfel() addressed this request?

@DHekstra
Copy link
Contributor Author

Let's keep this open until I get a chance to try that. I expect that #260 does everything I need to, but I have not tested it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants