Skip to content

Latest commit

 

History

History
27 lines (17 loc) · 1.15 KB

README.md

File metadata and controls

27 lines (17 loc) · 1.15 KB

Qualitative Data Analysis and Text Mining Classes

Welcome in Qualitative Data Analysis and Text Mining (Analiza danych jakościowych i Text Mining) classes repo 👋

Main branch includes NLP project containing the analysis of English Premier League tweets about top clubs (wordcloud, tokens, documents and visualizations) and classification using the following classifiers:

  • Multinomial Logistic Regression
  • Decision Tree
  • Random Forest
  • Gradient Boosting
  • MLP
  • Bagging

Others branches content:

🔸 'lab1' Branch - regex (Regular expression operations)

🔸 'lab2' Branch - clearing text with regex cd., removing stop words, stemming and lemmatization with nltk library

🔸 'lab3' Branch - WordCloud

🔸 'lab4' Branch - tokenization and vectorization of text with scikit-klearn library, operations on numpy arrays, visualizations with matplotlib

🔸 'lab5' Branch - text classification with decision tree, random forest, SVM, AdaBoost, Bagging

🔸 'entity_matching' Branch - calculation of distance and similarity - euclidean similarity, cosine distance, cosine similarity

🔸 'project' Branch - merged with main branch