Skip to content

Nehal-Pawar/Apache-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

https://github.com/Nehal-Pawar/Apache-Spark/tree/master/SparkProject1/src/pawar/nehal/spark

Apache-Spark

Apache Spark (Scala)

• Demonstrated Apache Spark features like broadcast, join, persist and cache while doing data analysis on movie rating, friends’ network by age, finding min and max average temperature and popular movies

• Explored the architecture and processing of Apache Spark as a framework through research with a faculty member and deployed the spark program on AWS EMR using SBT build tool

• Achieved faster result by broadcasting RDD instead of using Dataset for find most popular movie

• Showcased implementation of BFS on Spark to find the degree of separation among super-hero social network

Item Based Collaborative filtering (Scala)

• Recommended movies by paring similar user rating by utilizing spark features like self-join, cache and persist which preserves the RDD in memory for faster performance