Skip to content

Commit

Permalink
start of work to add groups (values, fields) to deid recipe
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <vsochat@stanford.edu>
  • Loading branch information
vsoch committed Mar 6, 2020
1 parent 3b924bb commit 115a9c2
Show file tree
Hide file tree
Showing 7 changed files with 172 additions and 8 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ and **Merged pull requests**. Critical items to know are:
Referenced versions in headers are tagged on Github, in parentheses are for pypi.

## [vxx](https://github.com/pydicom/deid/tree/master) (master)
- adding support for tag groups (values, fields) (0.1.4)
- Adding option to provide function to remove (must return boolean) (0.1.38)
- removing matplotlib version requirement (0.1.37)
- Matplotlib dependency >= 2.1.2 (0.1.36)
Expand Down
7 changes: 6 additions & 1 deletion deid/config/standards.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,15 @@
formats = ["dicom"]

# Supported Sections
sections = ["header", "labels", "filter"]
sections = ["header", "labels", "filter", "values", "fields"]

# Supported Header Actions
actions = ("ADD", "BLANK", "JITTER", "KEEP", "REPLACE", "REMOVE", "LABEL")

# Supported Group actions
fields_actions = ["FIELD"]
values_actions = ["FIELD", "SPLIT"]

# Valid actions for a filter action
filters = (
"contains",
Expand Down
12 changes: 9 additions & 3 deletions deid/config/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,14 @@
from deid.logger import bot
from deid.utils import read_file
from deid.data import data_base
from deid.config.standards import formats, actions, sections, filters

from deid.config.standards import (
formats,
actions,
sections,
filters,
fields_actions,
values_actions,
)
from collections import OrderedDict
import os
import re
Expand Down Expand Up @@ -130,7 +136,7 @@ def load_deid(path=None):
config = OrderedDict()
section = None

while len(spec) > 0:
while spec:

# Clean up white trailing/leading space
line = spec.pop(0).strip()
Expand Down
2 changes: 1 addition & 1 deletion deid/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"""

__version__ = "0.1.38"
__version__ = "0.1.4"
AUTHOR = "Vanessa Sochat"
AUTHOR_EMAIL = "vsochat@stanford.edu"
NAME = "deid"
Expand Down
1 change: 1 addition & 0 deletions docs/_docs/user-docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ these pages will help you to use the deid software.

- [Filters]({{ site.baseurl }}/user-docs/recipe-filters/): How to write sections to filter and flag images.
- [Headers]({{ site.baseurl }}/user-docs/recipe-headers/): How to write header actions to update image headers.
- [Groups]({{ site.baseurl }}/user-docs/recipe-groups/): for tags (including fields and values) that can be referenced in headers.
- [Labels]({{ site.baseurl }}/user-docs/recipe-labels/): can be used to add metadata to your recipes.

## Client
Expand Down
150 changes: 150 additions & 0 deletions docs/_docs/user-docs/recipe-groups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
title: Recipe Groups
category: User Documentation
order: 4
---

The [recipe headers]({{ site.baseurl }}/user-docs/recipe-headers/) page taught you
how to write a recipe that has one or more commands to parse a dicom image header.
For example, we might have:

```
FORMAT dicom
%header
ADD PatientIdentityRemoved Yes
BLANK OrdValue
KEEP Modality
REPLACE id var:entity_id
JITTER StudyDate var:entity_timestamp
REMOVE ReferringPhysicianName
```

But what if we want to optimize our parsing by creating custom groups of tags
that are based on the field names, or the values? This is the intended use
case for groups - a group is a group of tags, either identified by
fields or values, for which an action can be applied. For the examples
below, we will use this sample header provided by [@wetzelj](https://github.com/wetzelj). Thank you!

```
(0008,0050) : SH Len: 10 AccessionNumber Value: [999999999 ]
(0008,0070) : LO Len: 8 Manufacturer Value: [SIEMENS ]
(0008,1090) : LO Len: 22 ManufacturersModelName Value: [SOMATOM Definition AS+]
(0009,0010) : LO Len: 20 PrivateCreator10xx Value: [SIEMENS CT VA1 DUMMY]
(0010,0010) : PN Len: 14 PatientsName Value: [SIMPSON^HOMER^J^]
(0010,0020) : LO Len: 12 PatientID Value: [000991991991 ]
(0010,1000) : LO Len: 8 OtherPatientIDs Value: [E123456]
(0010,1001) : PN Len: 8 OtherPatientNames Value: [E123456]
(0010,21B0) : LT Len: 90 AdditionalPatientHistory Value: [MR SIMPSON LIKES DUFF BEER]
(0019,1091) : DS Len: 6 <Unknown Tag> Value: [E123456]
(0019,1092) : DS Len: 6 <Unknown Tag> Value: [M123456]
```

## Fields

A fields section looks like the following:

```
FORMAT dicom
%fields patient_info
FIELD PatientID
FIELD startswith:OtherPatient
FIELD endswith:Name
```

There would be multiple ways to do this (for example you could have used `startswith:Patient` to target both `PatientsName`
and `PatientID`) but generally this will produce a list of fields that are named "patient_info." Here is the list
rendered out pretty:

```
patient_info
------------
PatientID
OtherPatientIDs
OtherPatientNames
PatientsName
```

We can then use this in recipe header sections where we want to apply an action to one or more fields
as follows:

```
%header
REPLACE fields:patient_info func:generate_uid
```

And this reads nicely as "Replace fields defined in patient_info to be the variable
I'm defining with the function generate_uid (which should be added to each item
after lookup).

This of course means that the actions supported for the `%fields` section includes:

- **FIELD** reference to a full name of a field, or any parsing of any [expander]({{ site.baseurl }}/examples/header-expanders/).


## Values

It could be that you want to generate a list of _values_ extracted from the dicom
to use as flags for checking other fields. For example, if I know that the Patient's ID
is in PatiendID, I would want to extract the patient's name from that field,
and then search across fields looking for any instance of a first or last name.
This is the purpose of the `%values` group. Instead of defining rules to create
a list of fields, we write rules to extract values. Let's take a look at an
example:

```
%values patient_info
SPLIT PatientsName splitval='^';minlength='4'
FIELD PatientID
FIELD OtherPatientIDs
```

You'll notice that we have `FIELD` again, but since this is in a `%values`
section, this is saying "Find the fields Patient ID and Other Patient IDs, and whatever
_values_ you find there, add to the list `patient_info`." You'll also
notice that the first line uses a new action `SPLIT`:

```
SPLIT PatientsName splitval='^';minlength='4'
```

This action says to start with the field `PatientsName`, split based on the `^`
character, and keep results that have a length greater than or equal to 4.
Let's talk about these actions in detail. Field is the same, but we also have split:

- **FIELD** refers to the full name of a field, or any parsing of any [expander]({{ site.baseurl }}/examples/header-expanders/). Instead of including these field names, we grab the values from them, and add to our list.
- **SPLIT** indicates that we want to apply a split operation to a field (or expansion of fields) and for all, to split by a character (defaults to a space) and take a minimum length (defaults to 1).

The result of the above operation might look like this - and remember that this is a list of values.

```
patient_info
------------
HOMER
SIMPSON
```

You could then reference these values for some header action. For example, let's say
we want to remove any field that contains these identifiers:

```
%header
REMOVE values:patient_info
```

The implication of the above is that we are checking all fields for these values.
This would be functionally equivalent:

```
%header
REMOVE ALL values:patient_info
```

Or you could chose some other field name, or field expander, if you want to limit
the removal to some subset.

If you haven't yet, take a look at how at generate a basic [get]({{ sitebase.url }}/getting-started/dicom-get/),
which is will get a set of fields and values from your dicom files.
7 changes: 4 additions & 3 deletions docs/_docs/user-docs/recipe-headers.md
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,8 @@ REPLACE InstanceSOPUID var:source_id
```

Now that you know how configuration works, you have a few options.
If you want to write a text file and get going with cleaning your files, you should
look at some examples for generating a basic [get]({{ sitebase.url }}/getting-started/dicom-get/),
which is will get a set of fields and values from your dicom files. For a full walk through
You can learn how to define groups of tags based on fields or values in [groups]({{ site.baseurl }}/user-docs/recipe-groups/),
or if you want to write a text file and get going with cleaning your files, you should
look at some examples for generating a basic [get]({{ sitebase.url }}/getting-started/dicom-get/).
This is the action to get a set of fields and values from your dicom files. For a full walk through
example with a recipe, see the [recipe example]({{ sitebase.url }}/examples/recipe)

0 comments on commit 115a9c2

Please sign in to comment.