Skip to content

Files

Latest commit

bb2ec0a · Dec 2, 2020

History

History
12 lines (11 loc) · 656 Bytes

README.md

File metadata and controls

12 lines (11 loc) · 656 Bytes

Data_Linkage_And_Classification

My Data_Linkage and Classification Project

Main skills used in this project

  • Data Linkage on the stocks of Google and Amazon based on their name, description and price
  • Using the library like fuzzywuzzy and textdistance for the data linkage
  • Using the idea of Blocking to make the linkage part more efficient and with higher accuracy
  • Comparing three the accuracy difference Classsfication Algo, decision tree, k-nn(n = 5) and k-nn(n = 10)
  • Feature engineering and selection
    • Interaction term pairs and Clustering label
    • Principal Component Analysis
    • Naive choosing the first four features