Skip to content

Web Data Processing Application involving Big Data NLP and entity linking(disambiguation)

Notifications You must be signed in to change notification settings

AimVoma/Web_Data_Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Web Data Processing Systems Project

Part 1: Python Wrapper For Spark

The basic operation of the PySpark_Parser, is to

  • extract text data from compressed WARC files
  • Pre-process and sanitize the data for further analysis
  • Extract any active entities
  • Perform Entity Linking/Disambiguation with a Knowledge Base(Freebase)
  • Resort only on the baseline ranking system of the Freebase(No time for self-ranking methods)
  • Parallelize the main procedure function by performing SPark map-reduce operations

Part 2: Perform Entity Extraction and Analysis based on New York Times Articles

The basic operation of the Entity Analysis - New York Times.py, is to

  • Perform data extraction from New York Times API
  • Recognise any of the unique entities and their frequency per document
  • Recognise any relational patterns between entities(Co-occurances)
  • Perform Sentiment Analysis on Entity content
  • Perform basic statistics and measure any Influence points among the entities
  • Depict the results in a form of bar chart

About

Web Data Processing Application involving Big Data NLP and entity linking(disambiguation)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages