Justin A. Gould (gould29@purdue.edu)
This repository will house code, information, etc. on the infrastructure and means to annotate at Purdue University. Our annotation infrastructure leverages the Prodigy annotation framework.
Prodigy is "a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration":
Today’s transfer learning technologies mean you can train production-quality models with very few examples. With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection. You'll move faster, be more independent and ship far more successful projects.
As of January 2021, Prodigy supports the following features:
- Named Entity Recognition
- Dependencies & Relations
- Audio and Video Classification
- Audio and Video Transcription
- Text Classification (multi-class and binary!)
- Computer Vision
- Image Annotation
- Image Classifcation
- Image Options
- Image Captioning
- A/B Evaluation
Students at Purdue University can contact Justin Gould within the Data Mine at gould29@purdue.edu to learn more.
Currently, we are exploring using Prodigy via the JupyterLab extension:
For installation information, please see the extension's GitHub page and check out Prodigy's documentation.
Each branch will serve a different purpose:
main
: General overview and pertinent information for all userssetup_and_installation
: Specific information, files, environment settings, etc. for setting up and installing Prodigy on your machine