Skip to content

Latest commit

 

History

History
93 lines (70 loc) · 4.11 KB

README.md

File metadata and controls

93 lines (70 loc) · 4.11 KB

IMDB Data Analysis Using SQL

Overview

This repository contains a comprehensive analysis of IMDB movie data using SQL. The analysis is designed to extract meaningful insights to assist in strategic decision-making for a global film production initiative. The project leverages structured data and SQL expertise to derive actionable intelligence from the IMDB dataset.

This work demonstrates expertise in SQL, data analysis, and business intelligence, making it a valuable addition to your pinned projects on GitHub.


Key Features

  • Detailed Data Analysis: Analysis performed on a dataset spanning three years, covering multiple attributes such as genres, directors, actors, production companies, and movie ratings.
  • Actionable Insights: Extracted critical patterns and trends to support data-driven decision-making in the entertainment industry.
  • Structured Process: The analytics process is divided into logical segments, each contributing to a thorough understanding of the dataset.
  • Scalable Design: The project framework and SQL scripts are designed for scalability and adaptability to other datasets.

Dataset Description

The IMDB dataset includes information about movies produced over the past three years. The data spans attributes such as:

  • Movie titles, genres, and production years
  • Ratings and user votes
  • Directors, actors, and production companies
  • Movie durations and release months

Highlights

Insights Derived

  • Most movies were produced in 2017, with March being the most active production month.
  • The Drama genre dominates in terms of the number of movies produced, while Action movies have the longest average duration.
  • Top-performing directors, production companies, and actors were identified based on ratings and other metrics.
  • Key global production insights, including:
    • Dream Warrior Pictures and National Theater Live have produced the most hit movies.
    • Marvel Studios leads in votes received for its productions.

Advanced SQL Techniques Used

  • Complex JOINs for multi-table analysis
  • Use of aggregate functions (e.g., AVG, COUNT, SUM) to summarize data
  • Advanced filtering with subqueries and CTEs (Common Table Expressions)
  • Comprehensive GROUP BY and HAVING clauses for detailed breakdowns
  • Analytical queries for ranking and trend detection

Repository Structure

├── IMDB_dataset_import.sql       # SQL script to create and populate the database
├── IMDB_question.sql             # SQL queries to address specific analytical objectives
├── Solution_IMDB_questions.sql   # Consolidated solutions for the analytical tasks
├── IMDb_movies_Data_and_ERD.xlsx # Dataset and Entity-Relationship Diagram
├── Executive_Summary.pdf         # Business insights and findings
└── README.md                     # Project documentation

Usage

  1. Database Setup:
    • Use IMDB_dataset_import.sql to create and populate the database.
  2. Analysis:
    • Execute IMDB_question.sql or Solution_IMDB_questions.sql to replicate the analysis and derive insights.
  3. Review Results:
    • Refer to Executive_Summary.pdf for a summary of findings.

ERD Overview

The Entity-Relationship Diagram (ERD) provides a clear representation of database relationships, making it easier to understand the dataset's structure and logical flow.


Key Business Insights


Technologies Used

  • SQL: Primary language for data analysis
  • RDBMS: Tested on MySQL, compatible with other relational database systems
  • Excel: For dataset visualization and ERD creation

Future Scope

  • Integration with visualization tools like Tableau or Power BI for advanced analytics dashboards
  • Incorporation of machine learning techniques for predictive modeling
  • Expansion to include audience sentiment analysis and box-office trends

Author

  • Satvik Praveen
    Data Science Enthusiast | LinkedIn