Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task: Rationalize Storage and Retrieval of Metadata in Falaise #157

Open
drbenmorgan opened this issue Sep 16, 2019 · 1 comment
Open

Task: Rationalize Storage and Retrieval of Metadata in Falaise #157

drbenmorgan opened this issue Sep 16, 2019 · 1 comment
Labels
documentation enhancement falaise Framework core, including types and applications task

Comments

@drbenmorgan
Copy link
Member

This task is to bring together several issues on metadata into one for coherence sake. Directly affected issues which are merged are:

Indirect, but related Issues are #90 on Conditions Data access, and demos of the Art mechanisms in SuperNEMO-DBD/Impressionist#6. It's also strongly correlated with data cataloguing as metadata likely forms the basis of that system.

Overview

In Falaise applications, "metadata" is stored in .brio files in a TTree named "GI". Each entry in the tree is an instance of the datatools::properties dictionary class, each of which is intended to form a section of an overall datatools::multi_properties instance. Falaise applications have the option to read/write this multi_properties object to a text file separate from the .brio file holding the events.

At present, read/write of data to the "GI" store is handled by the application code, with no way for plugin modules to read/write it. No external programs area available to query the metadata of a .brio file.

There is no documentation on what is stored in the .brio file.

Proposed Improvements

  1. Provide a falaise-file-dumper application to allow query/dumping of the metadata store(s).
    • Remove the ability of flsimulate/flreconstruct themselves to read/write the metadata to a separate file. This can lead to a loss of coherence between meta/event data.
    • The effect of a "separate" metadata file can be achieved by running falaise-file-dumper on the brio file after it's generated.
    • Depending on the systems outside of Falaise that consume the metadata, may want JSON output as well as multi/properties.
  2. Improve what data is always stored. For example, Add and track metadata in input/output files #57 outline some of the data that should be present from a simulation run.
    • From flsimulate, the full settings should be stored.
    • From flreconstruct, the full pipeline script and configuration should be stored, including any custom variant settings. *Plus, all settings from the input file should be stored (directly or indirectly via some primary key).
    • The main idea is so that the complete processing "provenance" can be tracked.
    • For example, when reading a raw or simulated data file into flreconstruct, we must reconstitute the same geometry for the detector the data originates from!
  3. Maybe provide a "metadata service" that modules in flreconstruct can access.
    • Not for things like conditions data!
    • "Maybe" as metadata is really run/process level info, not event.
    • Likely implement as "write once, read many" to avoid edit/overwrite.

Anything else?

Task(s)

The first "file dumper" task is pretty independent of the rest, so can be its own PR. The second and third tasks needs some input from you all (the second task replaces #118).

@bmorgan can work on the first, but second and third will need volunteers at least for testing and review given their overlap with Reconstruction/Analysis/Data Quality.

@drbenmorgan
Copy link
Member Author

See also BxCppDev/Bayeux#53, which requests implementation of an "include" mechanism for the current scripting files. That'll allow much simpler composition and tracking of configuration, e.g. with inclusion, you could write a script:

...
#@include "snemo/reconstruction/default.conf"

[name="ChargedParticleTracker" type="snemo::reconstruction::charged_particle_tracking_module"]
AFD.minimal_delayed_time : real as time = 25 us

i.e. be able to override single or multiple parameters without having to copy/paste the entire script. Since it also means the configuration of flsimulate/flreconstruct end up in a single multi_properties instance, that'll be much easier to store, track, update and reconstitute.

@drbenmorgan drbenmorgan pinned this issue Oct 20, 2019
@drbenmorgan drbenmorgan added the falaise Framework core, including types and applications label Apr 3, 2020
@drbenmorgan drbenmorgan unpinned this issue Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement falaise Framework core, including types and applications task
Projects
None yet
Development

No branches or pull requests

7 participants