bigNN: an open-source big data toolkit focused on biomedical sentence classification

Every single day, a large amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health related social media posts, clinical notes, and drug reviews. Processing this data in an efficient manner is a really daunting task without the help of clever computational strategies, and it makes text classification as an imperative and a major operation to big data text analytics. In this contribution, we developed an open-source software for big data text classification called bigNN. It implements a word2vec neural network model over Apache Spark to aim at big data sentence classification in a timely fashion. The software offers a graphical user interface, and it facilitates reproducible research in sentence analysis by allowing users to configure different sets of Apache Spark and word2vec neural network parameters. Furthermore, we introduce application of bigNN in medical informatics domain. bigNN is fully documented and it is publicly and freely available at https://github.com/bircatmcri/bigNN.

The bigNN includes the following packages:

Package Name	Description
edu.mfldclin.mcrf.bignn.gui	Implementation of the graphical user interface
edu.mfldclin.mcrf.bignn.setting	Implementation of pre-defined and user-defined settings required to the system
edu.mfldclin.mcrf.bignn.learning	Implementation of text pre-processing and neural network learning model
edu.mfldclin.mcrf.bignn.evaluation	It evaluates the neural network predictive model

Requirements:

Apache Spark 2.10
Java2SE 8

bigNN software architectural model:

The bigNN software architectural model is shown in includes the following figure.

Collaborators:

Ahmad P. Tafti (Marshfield Clinic Research Institute)
Ehsun Behravesh (IEEE Member)
Mehdi Assefi (University of Georgia)
Eric LaRose (Marshfield Clinic Research Institute)
Jonathan Badger (Marshfield Clinic Research Institute)
John Mayer (Marshfield Clinic Research Institute)
AnHai Doan (University of Wisconsin-Madison)
David Page (University of Wisconsin-Madison)
Peggy Peissig (Marshfield Clinic Research Institute)

Acknowledgment:

The project described was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), grant UL1TR000427. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Publications:

The workflow and architectural model of the bigNN is fully explained in [1]. Any publication using the bigNN would encourage to cite the two following papers. Thanks!

[1] Tafti, A.P., Behravesh, E., Assefi, M., LaRose, E., Badger, J., Mayer, J., Doan, A., Page, D., Peissig, P. 2017. bigNN: an open-source big data toolkit focused on biomedical sentence classification. IEEE BIG DATA 2017. [Paper]

[2] Tafti, A.P., Badger, J., LaRose, E., Shirzadi, E., Mahnke, A., Mayer, J., Ye, Z., Page, D. and Peissig, P., 2017. Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR medical informatics, 5(4), p.e51. [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
src/main		src/main
.gitignore		.gitignore
ADEs_JMIR.pdf		ADEs_JMIR.pdf
README.md		README.md
bigNN architecture.png		bigNN architecture.png
bigNN.pdf		bigNN.pdf
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bigNN: an open-source big data toolkit focused on biomedical sentence classification

Requirements:

bigNN software architectural model:

Collaborators:

Acknowledgment:

Publications:

About

Releases

Packages

Contributors 4

Languages

bircatmcri/bigNN

Folders and files

Latest commit

History

Repository files navigation

bigNN: an open-source big data toolkit focused on biomedical sentence classification

Requirements:

bigNN software architectural model:

Collaborators:

Acknowledgment:

Publications:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages