This repository contains source code for the EMNLP 2019 paper " "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (Paper).
git clone https://github.com/twjiang/MIMO_CFE.git
-
The
dumped MIMO
can be found here. -
The
word embedding
we use can be found here. -
The
pre-trained language model
we use can be found here.
put these files into ./resources
folder
This repo is tested on Python 3.6, PyTorch 1.2.0
Create Environment (Optional): Ideally, you should create an environment for the project.
conda create -n mimo python=3.6
conda activate mimo
pip install -r requirments.txt
cd MIMO_service
python mimo_server.py #Start a MIMO service
python client.py
The output of the demo is shown below.
{
'statements': {
'stmt 1': {
'text': 'Histone deacetylase inhibitor valproic acid ( VPA ) has been used to increase the reprogramming efficiency of induced pluripotent stem cell ( iPSC ) from somatic cells , yet the specific molecular mechanisms underlying this effect is unknown .',
'fact tuples': [
['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
['VPA', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming'],
['specific molecular mechanisms', 'NIL', 'is unknown', 'NIL', 'NIL']
],
'condition tuples': [
['iPSC', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
['induced pluripotent stem cell', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
['specific molecular mechanisms', 'NIL', 'underlying', 'NIL', 'effect']
],
'concept_indx': [0, 1, 2, 3, 4, 6, 17, 18, 19, 20, 22, 25, 26, 30, 31, 32],
'attr_indx': [14, 15, 35],
'predicate_indx': [8, 9, 10, 11, 12, 24, 33, 36, 37]
}
}
}
example commands for pretrain:
(all gates for LM, pretrain)
python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM --pretrain
(all gates for POS, pretrain)
python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM --pretrain
(all gates for LM and POS, pretrain)
python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM --pretrain
example commands with multi-output:
(all gates for LM with multi-output)
python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM
(all gates for POS with multi-output)
python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM
(all gates for LM and POS, with multi-output)
python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM
@inproceedings{jiang-mimo,
title = "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text",
author = "Jiang, Tianwen and Zhao, Tong and Qin, Bing and Liu, Ting and Chawla, Nitesh V and Jiang, Meng",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2019",
address = "Hong Kong, China",
publisher = "Association for Computational Linguistics",
}