Skip to content

Latest commit

 

History

History
39 lines (21 loc) · 897 Bytes

README.md

File metadata and controls

39 lines (21 loc) · 897 Bytes

Data-Mining

The objective of this project is to extract textual data articles from URLs and perform text analysis to compute variables.

Data Extraction:

For each of the articles extract the article text and save the extracted article in a text file with URL_ID as its file name. While extracting text, make sure your program extracts only the article title and the article text. It should not extract the website header, footer, or anything other than the article text.

Data Analysis

For each of the extracted texts from the article, perform textual analysis and compute variables. You need to save the output in CSV format.

Variable

POSITIVE SCORE

NEGATIVE SCORE

POLARITY SCORE

SUBJECTIVITY SCORE

AVG SENTENCE LENGTH

PERCENTAGE OF COMPLEX WORDS

FOG INDEX

AVG NUMBER OF WORDS PER SENTENCE

COMPLEX WORD COUNT

WORD COUNT

SYLLABLE PER WORD

PERSONAL PRONOUNS

AVG WORD LENGTH