Skip to content

Latest commit

 

History

History
87 lines (67 loc) · 4.14 KB

README.md

File metadata and controls

87 lines (67 loc) · 4.14 KB

MOUSEDataPipeline

MOUSEDataPipeline provides tools for the (automatic) processing of new MOUSE datafiles, offering a structured approach to manage and analyze scientific data generated by the MOUSE instrument.

prerequisites and assumptions

Nomenclature

  • Measurement Date: A rough timestamp indicating when measurements on a specific set of samples began. Each set of samples belonging together is grouped under a unique measurement date in the format YYYYMMDD.

  • Batch: Represents a set of measurements for a single sample. A batch includes all measurements across various configurations for that particular sample.

  • Repetition: Refers to an individual measurement within a specific configuration. This includes the measurement alongside the preceding direct beam and direct-beam-through-sample measurements, which are essential for determining the primary beam flux, beam position, and transmission factor.

expected directory structure

The data is organized under a predefined directory structure to ensure consistency and facilitate automated processing:

├─── Proposals
│   └─── 2025
└─── Measurements
    ├─── SAXS002
    │   ├─── logbooks
    │   └─── data
    │       └─── Masks
    │       └─── 2025
    │           └─── 20250101  # (measurement date)
    │               └─── 20250101_[batch]_[repetition] # directory with files
    │                   └───eiger_[number]_master.h5
    │                   └───eiger_[number]_data00001.h5
    │                   └───im_craw.nxs
    │                   └─── beam_profile
    │                       └─── eiger_[number]_master.h5
    │                       └─── eiger_[number]_data00001.h5
    │                       └─── im_craw.nxs
    │                   └───beam_profile_through_sample
    │                       └─── eiger_[number]_master.h5
    │                       └─── eiger_[number]_data00001.h5
    │                       └─── im_craw.nxs
    │               └─── 20250101_[batch]_[repetition]
    │               └─── ...
    │               └─── autoproc  # (processed datafiles)    

Some flexibility is possible, there is a MOUSE_settings.yaml file that contains the paths to given sections in the tree. These can be adapted to point at the bits in your structure

usage example:

To process directories using specific configurations and steps, execute the following commands in your terminal:

python src/directory_processor.py --config MOUSE_settings.yaml --single_dir ~/Documents/BAM/Measurements/newMouseTest/Measurements/SAXS002/data/2025/20250101/20250101_21_22  --steps processstep_translator_step_1 processstep_translator_step_2 processstep_beamanalysis

Alternatively, specify measurement details directly:

python src/directory_processor.py --config MOUSE_settings.yaml --ymd 20250101 --batch 21 --repetition 22 --steps processstep_translator_step_1 processstep_translator_step_2 processstep_beamanalysis

If you want to do all currently ready steps for all repetitions in a batch, run the following:

python src/directory_processor.py --config MOUSE_settings.yaml \
--ymd 20250101 --batch 21 --parallel --steps \
processstep_translator_step_1 \
processstep_translator_step_2 \
processstep_beamanalysis \
processstep_cleanup_files \
processstep_add_mask_file \
processstep_metadata_update \
processstep_thickness_from_absorption \
processstep_add_background_files \
processstep_stacker

top-level methods:

1. directory_processor

  • Processes all data for a specified measurement date (YYYYMMDD), batch, and repetition, or by a given directory path.
  • Executes the defined processing steps, which should ideally be wrappers around CLI-executable scripts, though this isn't strictly enforced.

2. watcher

WIP, not functional yet! This component aims to continuously monitor a measurement date directory for newly completed repetitions, automatically processing them as they become available.

functionality methods:

TBC...