-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anonymise DICOM pixel data using OCR #252
Comments
That sounds great! Let me know what I can do to support you for that. |
We (working on Microsoft Presidio) currently have this capability in beta: https://microsoft.github.io/presidio/image-redactor/ |
Thanks very much for your contribution @omri374 ! |
Hi @howff, Presidio is very customizable, and allows you to plug in multiple tools. Currently, we are using Tesseract, but we are working on a next version which would allow you to plug any OCR easily: microsoft/presidio#1049 As this is still in design, we'd be very happy to get your feedback on this based on your experience with DICOM de-identification and are open to contributions of all sorts. For NER, we support multiple NLP tools like Huggingface and Flair as well. In our demo, you can experiment with two BERT based approaches, and a flair approach: https://huggingface.co/spaces/presidio/presidio_demo I agree that any NER wouldn't necessarily be accurate for OCR, so we use hints from the DICOM metadata, and can customize the detection of PHI using other approaches such as rule based patterns and deny-lists. |
cc @niwilso |
That's exactly the same approach I've taken here (see ocrengine.py and nerengine.py) https://github.com/SMI/dicompixelanon |
Yeah! I’m happy to help however I can. |
I notice you have a request for anonymising pixel data using OCR. I have been working on this, but in a separate code base, not as modifications to deid. It turns out that the hardest part is the evaluation, not the actual OCR. What I can report right now is that easyocr (python library) gives really excellent results. There's still a few things to watch out for, but it would be quite easy to integrate easyocr into deid I think.
The text was updated successfully, but these errors were encountered: