Skip to content

bigNN: an open-source big data toolkit focused on biomedical sentence classification

Notifications You must be signed in to change notification settings

bircatmcri/bigNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bigNN: an open-source big data toolkit focused on biomedical sentence classification

Every single day, a large amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health related social media posts, clinical notes, and drug reviews. Processing this data in an efficient manner is a really daunting task without the help of clever computational strategies, and it makes text classification as an imperative and a major operation to big data text analytics. In this contribution, we developed an open-source software for big data text classification called bigNN. It implements a word2vec neural network model over Apache Spark to aim at big data sentence classification in a timely fashion. The software offers a graphical user interface, and it facilitates reproducible research in sentence analysis by allowing users to configure different sets of Apache Spark and word2vec neural network parameters. Furthermore, we introduce application of bigNN in medical informatics domain. bigNN is fully documented and it is publicly and freely available at https://github.com/bircatmcri/bigNN.

The bigNN includes the following packages:

Package Name Description
edu.mfldclin.mcrf.bignn.gui Implementation of the graphical user interface
edu.mfldclin.mcrf.bignn.setting Implementation of pre-defined and user-defined settings required to the system
edu.mfldclin.mcrf.bignn.learning Implementation of text pre-processing and neural network learning model
edu.mfldclin.mcrf.bignn.evaluation It evaluates the neural network predictive model

Requirements:

  • Apache Spark 2.10
  • Java2SE 8

bigNN software architectural model:

The bigNN software architectural model is shown in includes the following figure.

alt text


Collaborators:

  1. Ahmad P. Tafti (Marshfield Clinic Research Institute)
  2. Ehsun Behravesh (IEEE Member)
  3. Mehdi Assefi (University of Georgia)
  4. Eric LaRose (Marshfield Clinic Research Institute)
  5. Jonathan Badger (Marshfield Clinic Research Institute)
  6. John Mayer (Marshfield Clinic Research Institute)
  7. AnHai Doan (University of Wisconsin-Madison)
  8. David Page (University of Wisconsin-Madison)
  9. Peggy Peissig (Marshfield Clinic Research Institute)

Acknowledgment:

The project described was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), grant UL1TR000427. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Publications:

The workflow and architectural model of the bigNN is fully explained in [1]. Any publication using the bigNN would encourage to cite the two following papers. Thanks!

[1] Tafti, A.P., Behravesh, E., Assefi, M., LaRose, E., Badger, J., Mayer, J., Doan, A., Page, D., Peissig, P. 2017. bigNN: an open-source big data toolkit focused on biomedical sentence classification. IEEE BIG DATA 2017. [Paper]

[2] Tafti, A.P., Badger, J., LaRose, E., Shirzadi, E., Mahnke, A., Mayer, J., Ye, Z., Page, D. and Peissig, P., 2017. Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR medical informatics, 5(4), p.e51. [Paper]

About

bigNN: an open-source big data toolkit focused on biomedical sentence classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages