Skip to content
Matthias Schildwächter edited this page Sep 4, 2017 · 6 revisions

anonml-recognition-ml

This module is responsible for the annotation of the tokenized judment, with help of machine learning (GermaNER). Furthermore, it processes those annotations to be able to use them as anonymizations. After all found anonymizations were reviewed and reworked by human interaction the training data is build of this information and appended to the already contained training data to later on train the model. The functionality of GermaNER is also used for retraining the model.

API

Method Path Result Comment
GET /ml/get/evaluation/data/ evaluation data (F_1, Precision, Recall)
GET /ml/get/training/data/ training data (as String) In format of GermaNER
GET /ml/retrain/ true if retraining was successful starts the retraining process of GermaNER with the saved training data
GET /ml/retrain/status the time the training started (as String)
POST /ml/annotate/{id} a list of Anonymization objects expected parameter is id of the actual document
POST /ml/update/training/data/{id} true if the appending of training data was succesful expected parameter is id of the reviewed and saved document
POST /ml/calculate/f/one/{id} true if calculation worked expected parameter is id of the reviewed and saved document and a list of correct Anonymization objects
POST /ml/post/training/data/{resetOld}/ true if the send training data was appended expected parameter is a string with training data in format of GermaNER training data and a boolean if the data should overwrite

Set Up

Clone the Project

  1. Clone anonml-recognition-ml from https://github.com/anon-ml/anonml-recognition-ml.git
  2. Execute the following steps

GermaNER

  1. Clone https://github.com/tudarmstadt-lt/GermaNER \
  2. run "mvn clean install -Drat.skip=true" to install GermaNER in your local maven repository

Cleartk (adjusted version)

  1. Clone https://github.com/seyyaw/cleartk \
  2. run "clean install -Dmaven.test.skip=true" to install Cleartk in your local maven repository

Feature file

  1. Download the feature file from https://github.com/tudarmstadt-lt/GermaNER/releases/download/germaNER0.9.1/data.zip \
  2. Place it in ./src/main/resources/GermaNER of the service module of the cloned anonml-recognition-ml project

Retrain

The training file needs to be in the resources folder (named "trainingsFile.txt") https://docs.docker.com/engine/userguide/storagedriver/selectadriver/#other-considerations

Clone this wiki locally