CODER

CODER: Knowledge infused cross-lingual medical term embedding for term normalization. Paper

CODER++: Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations. Paper

Use the model by transformers

Models have been uploaded to huggingface/transformers repo.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("GanjinZero/UMLSBert_ENG")
model = AutoModel.from_pretrained("GanjinZero/UMLSBert_ENG")

English checkpoint: GanjinZero/coder_eng or GanjinZero/UMLSBert_ENG (old name)

English checkpoint CODER++: GanjinZero/coder_eng_pp (with hard negative sampling)

Multilingual checkpoint: GanjinZero/coder_all ~~or GanjinZero/UMLSBert_ALL (discarded old name)~~

Train your model

cd pretrain
python train.py --umls_dir your_umls_dir --model_name_or_path monologg/biobert_v1.1_pubmed

your_umls_dir should contain MRCONSO.RRF, MRREL.RRF and MRSTY.RRF. UMLS Download path:UMLS.

A small tool for load UMLS RRF

from pretrain.load_umls import UMLS
umls = UMLS(your_umls_dir)

Test CODER or other embeddings

CADEC

cd test
python cadec/cadec_eval.py bert_model_name_or_path
python cadec/cadec_eval.py word_embedding_path

MANTRA GSC

Download the Mantra GSC and unzip the xml files to /test/mantra/dataset, run

cd test/mantra
python test.py

MCSM

cd test/embeddings_reimplement
python mcsm.py

DDBRC

Only sampled data is provided.

cd test/diseasedb
python train.py your_embedding embedding_type freeze_or_not gpu_id

embedding_type should be in [bert, word, cui]
freeze_or_not should be in [T, F], T means freeze the embedding, and F means fine-tune the embedding

Citation

@article{YUAN2022103983,
title = {CODER: Knowledge-infused cross-lingual medical term embedding for term normalization},
journal = {Journal of Biomedical Informatics},
pages = {103983},
year = {2022},
issn = {1532-0464},
doi = {https://doi.org/10.1016/j.jbi.2021.103983},
url = {https://www.sciencedirect.com/science/article/pii/S1532046421003129},
author = {Zheng Yuan and Zhengyun Zhao and Haixia Sun and Jiao Li and Fei Wang and Sheng Yu},
keywords = {medical term normalization, cross-lingual, medical term representation, knowledge graph embedding, contrastive learning}
}

@misc{https://doi.org/10.48550/arxiv.2204.00391,
  doi = {10.48550/ARXIV.2204.00391},
  url = {https://arxiv.org/abs/2204.00391},
  author = {Zeng, Sihang and Yuan, Zheng and Yu, Sheng},
  title = {Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations},
  publisher = {arXiv},
  year = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
img		img
pretrain		pretrain
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CODER

Use the model by transformers

Train your model

A small tool for load UMLS RRF

Test CODER or other embeddings

CADEC

MANTRA GSC

MCSM

DDBRC

Citation

About

Releases

Packages

Languages

zengsihang/CODER

Folders and files

Latest commit

History

Repository files navigation

CODER

Use the model by transformers

Train your model

A small tool for load UMLS RRF

Test CODER or other embeddings

CADEC

MANTRA GSC

MCSM

DDBRC

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages