Big Data Platforms

Using Hadoop, PySpark in Python, and the University of Chicago’s high performance computing cluster to run machine learning algorithms and sentiment analysis on a 68GB+ dataset containing reviews of Amazon products in a JSON format that has a nested structure.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
AmazonReviews.ipynb		AmazonReviews.ipynb
Big Data Group Project.pdf		Big Data Group Project.pdf
README.md		README.md
Useful Commands.md		Useful Commands.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Platforms

About

Releases

Packages

Languages

eitrheim/Big-Data-Platforms

Folders and files

Latest commit

History

Repository files navigation

Big Data Platforms

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages