This repository contains a Python script for data analysis and visualization using popular libraries such as Pandas, Matplotlib, and Seaborn. Each line of code is explained below:
Follow these steps to install Jupyter Notebook and set up your environment:
-
Install Python:
- If you don't have Python installed, download and install it from the official Python website: Python Downloads
-
Install Jupyter Notebook:
- Open your command-line interface (CLI).
- Run the following command to install Jupyter Notebook using pip, Python's package manager:
pip install jupyter
-
Verify Installation:
- To ensure that Jupyter Notebook is installed correctly, run the following command in your CLI:
You should see the Jupyter Notebook version displayed.
jupyter --version
- To ensure that Jupyter Notebook is installed correctly, run the following command in your CLI:
-
download the repo:
- To ensure that Jupyter Notebook is installed correctly, run the following command in your CLI:
You should see Download files displayed into your download directory extract that then go to next step.
git clone https://github.com/hackThacker/DataAnalysis.git
- To ensure that Jupyter Notebook is installed correctly, run the following command in your CLI:
-
into files directory :
- Into files directory there are many lots of .csv files for given practise.
- You can used this files for practise also :
-
Now go into report folder:
- To ensure that you installed correctly, run the following command in your CLI:
- Running this command will instruct pip to install all the packages listed in the requirements.txt file
pip install -r requirements.txt
- After the installation process is complete, you can verify that the packages were installed successfully by running
pip list
- you have .csv files into your report folder and rename the files which you want to generate report into html from csv files to analysis dataset.
python dataset.py
-
After report go to analysis folder :
- Modify accoding to your .csv = into jyupter files.
- first read report which is generated and read each things and then what you want to analysis first clear on it.
- after you read report then into analysis folder there is jyupter files run jupyter :
- from your download folder run jupter then .csv files also into that same analysis folder
This will open a new tab in your web browser with the Jupyter Notebook interface.
jupyter notebook
Once Jupyter Notebook is running, follow these steps to open a file:
-
Create or Navigate to a Directory:
- You can create a new Jupyter Notebook by clicking the "New" button and selecting "Python 3" (or another kernel of your choice).
- To open an existing Jupyter Notebook file (.ipynb), navigate to the directory where the file is located.
-
Open a Notebook:
- Click on the notebook file you want to open. This will open the notebook in a new tab where you can edit and run code.
-
Working with Notebooks:
- You can add and edit cells in your notebook to write and execute code.
- To run a cell, select it and press Shift+Enter.
- Save your work regularly by clicking the "Save" button or using the keyboard shortcut (Ctrl+S or Cmd+S on macOS).
-
Import Python Libraries: Import essential Python libraries for data analysis and visualization.
-
Import CSV File: Read a CSV file ('customers-100000.csv') and store it in a Pandas DataFrame ('df').
-
Number of Columns and Rows: Retrieve the dimensions (rows and columns) of the DataFrame using
df.shape
. -
Top 5 Rows: Display the first 5 rows of the DataFrame using
df.head()
. -
DataFrame Info: Provide detailed information about the DataFrame, including data types and missing values, using
df.info()
. -
Drop Unrelated/Blank Columns: Remove the 'Status' and 'unnamed1' columns from the DataFrame using
df.drop()
. -
Check for Null Values: Calculate the sum of null values in each column using
pd.isnull(df).sum()
. -
Drop Null Values: Remove rows with missing values from the DataFrame using
df.dropna()
. -
Change Data Type: Convert the 'Amount' column to an integer data type using
df['Amount'] = df['Amount'].astype('int')
. -
Check Data Type: Check the data type of the 'Email' column using
df['Email'].dtypes
. -
DataFrame of All Columns: Retrieve the list of column names using
df.columns
. -
Rename Column: Rename the 'Marital_Status' column to 'Shaadi' (not applied to the DataFrame) using
df.rename()
. -
Describe Data: Generate summary statistics for numerical columns using
df.describe()
. -
Use Describe for Specific Columns: Generate summary statistics for specific columns using
df[['Email', 'Company', 'Country']].describe()
. -
Plot Bar Chart for Gender and Its Count: Create a bar chart showing the count of each 'Country' value using Seaborn.
-
Plot Bar Chart for Gender vs. Total Amount: Create a bar chart showing the total amount vs. gender using Seaborn.
-
Plot Bar Chart of Gender: Create a bar chart showing the count of each 'Age Group' value, labeled by counts and differentiated by gender.
-
Total Amount vs. Age Group: Create a bar chart showing the total amount vs. 'Age Group'.
-
Total Number of Orders from Top 10 States: Create a bar chart showing the total number of orders from the top 10 states.
-
Total Amount/Sales from Top 10 States: Create a bar chart showing the total amount/sales from the top 10 states.
-
Marital Status: Create a bar chart showing the count of each 'Marital_Status' value, labeled by counts.
-
Marital Status of Gender: Create a bar chart showing the sum of 'Amount' for each combination of 'Marital_Status' and 'Gender'.
-
Occupation: Create a bar chart showing the count of each 'Occupation' value, labeled by counts.
-
Total Amount by Occupation: Create a bar chart showing the total amount for each occupation.
-
Product Category: Create a bar chart showing the count of each 'Product_Category' value, labeled by counts.
-
Total Amount by Product Category: Create a bar chart showing the total amount for the top 10 product categories.
-
Total Orders by Product ID: Create a bar chart showing the total number of orders for the top 10 most sold products.
-
Top 10 Most Sold Products: Create a bar chart showing the top 10 most sold products.
You can used many files for learning into files directory there are many csv files are located
If you would like to contribute to this project or report issues, please follow our Contributing Guidelines.
This project is licensed under the LICENSE NAME - see the LICENSE.md file for details.
Feel free to use and modify this code for your own data analysis and visualization projects. If you have any questions or need further assistance, please don't hesitate to ask.