The project aims to analyze Yelp Data in Spark using PySpark and SparkSQL.
All the analysis have been performed on Databricks platform and can also be simulated in anywhere by just changing the import statements.
Please find the data here:
Download .json files. The folder contains business.json and reviews.json along with other files which have been used for analysis in this project.
There are basically 4 notebooks:
- YelpAnalysisSpark.ipynb which adresses loading of Json data, working with hierarchical data and performing geospacial analysis.
- YelpAnalysisSpark.ipynb attempts to combine two files which are a part of relational database.
- YelpAnalysisMongoDB1 attempts to analyze data using MongoDB framework using PyMongo API.
- GeoSpacialAnalysisMongoDB.ipynb which aims to perpare data for Geospacial Analysis in MongoDB.
Please mail me on if you have any queries.