#

citeseer-umd-collection

Here is 1 public repository matching this topic...

samujjwaal / CiteSeer-Text-Processing

Tokenizing text in the CiteSeer document corpus and determining the word frequencies for all the words in the collection

python data-science information-retrieval text-mining regex jupyter-notebook ranking nltk preprocess text-processing tokenization count-vectorizer porter-stemmer citeseer corpus-documents citeseer-umd-collection vocabulary-size

Updated Mar 28, 2020
Jupyter Notebook

Improve this page

Add a description, image, and links to the citeseer-umd-collection topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the citeseer-umd-collection topic, visit your repo's landing page and select "manage topics."