Financial-patent-document-classification

This project is to process 882 financial patent documents with natural language processing and classify them into 9 categories with 96% accuracy. There are three main steps in this project listed in the following.

First, data processing and feature engineering with NLP and text topics modeling skills set, such as Tf-IDF, postag and N-grams language model. From those 882 patent documents, there are 17538 features generated, including the length of each document before text cleaning, length of each docoment after text cleaning, number of punctuations, number of adj in the doc, number of noun in the doc, number of verb in the doc, all the unique 1-gram word frequency, top 20 2-gram word frequency, and top 10 3-gram word frequency. After feature selection part through random forest algorithm, 300 most important features are selected for the documents classification step.
Secondly, apply 7 traditional classficiation machine learning algorithms with parameters tuning in the documents classfication part, including randome forst, gradient boost, linear SVM, logistic regression, gaussian SVM, KNN, and naive bayes. Among them all, randome forst has the best accuracy with 88%.
Thirdly, Neural Network(ANN), RNN, and LSTM are applied in the classfication part. After tuning the models, ANN has the best accuracy with 96% followed by LSTM 89.4% and RNN 88.9%. Please see the pic.below.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ANN_lin.zip		ANN_lin.zip
README.md		README.md
dataprocessing+featureengineering+selection.zip		dataprocessing+featureengineering+selection.zip
financial.png		financial.png
lstm_lin.zip		lstm_lin.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial-patent-document-classification

About

Releases

Packages

linzhang-github/Financial-patent-document-classficiation

Folders and files

Latest commit

History

Repository files navigation

Financial-patent-document-classification

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages