diff --git a/04_Machine-Learning/05_Supervised_Learning_Algorithms/05. Random Forest.ipynb b/04_Machine-Learning/05_Supervised_Learning_Algorithms/05. Random Forest.ipynb
new file mode 100644
index 0000000..55a55d3
--- /dev/null
+++ b/04_Machine-Learning/05_Supervised_Learning_Algorithms/05. Random Forest.ipynb
@@ -0,0 +1,470 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# **Random Forest**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What is Random forest?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Random Forest is a popular machine learning algorithm that belongs to the ensemble learning method. It involves constructing multiple decision trees during training and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.\n",
+ "\n",
+ "Here are some key points about Random Forest:\n",
+ "\n",
+ "1. **Ensemble Method**: Random Forest is an ensemble of Decision Trees, usually trained with the \"bagging\" method. The general idea of the bagging method is that a combination of learning models increases the overall result.\n",
+ "\n",
+ "2. **Randomness**: To ensure that the model does not overfit the data, randomness is introduced into the model learning process, which creates variation between the trees. This is done in two ways:\n",
+ " - Each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set.\n",
+ " - When splitting a node during the construction of the tree, the split that is chosen is the best split among a random subset of the features.\n",
+ "\n",
+ "3. **Prediction**: For a classification problem, the output of the Random Forest model is the class selected by most trees (majority vote). For a regression problem, it could be the average of the output of each tree.\n",
+ "\n",
+ "Random Forests are a powerful and widely used machine learning algorithm that provide robustness and accuracy in many scenarios. They also handle overfitting well and can work with large datasets and high dimensional spaces."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Real-Life Analogy of Random Forest..."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Imagine you're trying to decide what movie to watch tonight. You have several ways to make this decision:\n",
+ "\n",
+ "1. **Ask a friend**: You could ask a friend who knows your movie preferences well. This is like using a single decision tree. Your friend knows you well (the tree is well-fitted to the training data), but their recommendation might be overly influenced by the movies you've both watched recently (the tree is overfitting).\n",
+ "\n",
+ "2. **Ask a group of friends independently**: You could ask a group of friends independently, and watch the movie that the majority of them recommend. Each friend will make their recommendation based on their understanding of your movie preferences. Some friends may give more weight to your preference for action movies, while others may focus more on the director of the movie or the actors. This is like a Random Forest. Each friend forms a \"tree\" in the \"forest\", and the final decision is made based on the majority vote.\n",
+ "\n",
+ "In this analogy, each friend in the group is a decision tree, and the group of friends is the random forest. Each friend makes a decision based on a subset of your preferences (a subset of the total \"features\" available), and the final decision is a democratic one, based on the majority vote. This process helps to avoid the risk of overfitting (relying too much on one friend's opinion) and underfitting (not considering enough preferences)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Working of Random Forest Algorithm?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The Random Forest algorithm works in the following steps:\n",
+ "\n",
+ "1. **Bootstrap Dataset**: Random Forest starts by picking N random records from the dataset. This sampling is done with replacement, meaning the same row can be chosen multiple times. This sample will be used to build a tree.\n",
+ "\n",
+ "2. **Build Decision Trees**: For each sample, it then constructs a decision tree. But unlike a standard decision tree, each node is split using the best among a subset of predictors randomly chosen at that node. This introduces randomness into the model creation process and helps to prevent overfitting.\n",
+ "\n",
+ "3. **Repeat the Process**: Steps 1 and 2 are repeated to create a forest of decision trees.\n",
+ "\n",
+ "4. **Make Predictions**: For a new input, each tree in the forest gives its prediction. In a classification problem, the class that has the majority of votes becomes the model’s prediction. In a regression problem, the average of all the tree outputs is the final output of the model.\n",
+ "\n",
+ "The key to the success of Random Forest is that the model is not overly reliant on any individual decision tree. By averaging the results of a lot of different trees, it reduces the variance and provides a much more stable and robust prediction.\n",
+ "\n",
+ "Random Forests also have a built-in method of measuring variable importance. This is done by looking at how much the tree nodes that use a particular feature reduce impurity across all trees in the forest, and it is a useful tool for interpretability of the model."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Bagging & Boosting?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Random Forest is an ensemble learning method that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.\n",
+ "\n",
+ "Random Forest uses the concept of **bagging** (Bootstrap Aggregating), not boosting. Here's how it works:\n",
+ "\n",
+ "1. **Bagging**: In bagging, multiple subsets of the original dataset are created using bootstrap sampling. Then, a decision tree is fitted on each of these subsets. The final prediction is made by averaging the predictions (regression) or taking a majority vote (classification) from all the decision trees. Bagging helps to reduce variance and overfitting.\n",
+ "\n",
+ "2. **Random Subspace Method**: In addition to bagging, Random Forest also uses a method called the random subspace method, where a subset of features is selected randomly to create a split at each node of the decision tree. This introduces further randomness into the model, which helps to reduce variance and overfitting.\n",
+ "\n",
+ "Boosting, on the other hand, is a different ensemble technique where models are trained sequentially, with each new model being trained to correct the errors made by the previous ones. Models are weighted based on their performance, and higher weight is given to the models that perform well. Boosting can reduce bias and variance, but it's not used in Random Forest. Examples of boosting algorithms include AdaBoost and Gradient Boosting."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Steps Involved in Random Forest Algorithm?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here are the steps involved in the Random Forest algorithm:\n",
+ "\n",
+ "1. **Select random samples from a given dataset**: This is done with replacement, meaning the same row can be chosen multiple times. This sample will be used to build a tree.\n",
+ "\n",
+ "2. **Construct a decision tree for each sample and get a prediction result from each decision tree**: Unlike a standard decision tree, each node in the tree is split using the best among a subset of predictors randomly chosen at that node. This introduces randomness into the model creation process and helps to prevent overfitting.\n",
+ "\n",
+ "3. **Perform a vote for each predicted result**: For a new input, each tree in the forest gives its prediction. In a classification problem, the class that has the majority of votes becomes the model’s prediction. In a regression problem, the average of all the tree outputs is the final output of the model.\n",
+ "\n",
+ "4. **Select the prediction result with the most votes as the final prediction**: For classification, the mode of all the predictions is returned. For regression, the mean of all the predictions is returned.\n",
+ "\n",
+ "The key to the success of Random Forest is that the model is not overly reliant on any individual decision tree. By averaging the results of a lot of different trees, it reduces the variance and provides a much more stable and robust prediction."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Important Features of Random Forest?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Random Forest has several important features that make it a popular choice for machine learning tasks:\n",
+ "\n",
+ "1. **Robustness to Overfitting**: Due to the randomness introduced in building the individual trees, Random Forests are less likely to overfit the training data compared to individual decision trees.\n",
+ "\n",
+ "2. **Handling of Large Datasets**: Random Forest can handle large datasets with high dimensionality. It can handle thousands of input variables and identify the most significant ones.\n",
+ "\n",
+ "3. **Versatility**: It can be used for both regression and classification tasks, and it can also handle multi-output problems.\n",
+ "\n",
+ "4. **Feature Importance**: Random Forests provide an importance score for each feature, allowing for feature selection and interpretability.\n",
+ "\n",
+ "5. **Out-of-Bag Error Estimation**: In Random Forest, about one-third of the data is not used to train each tree, and this data (called out-of-bag data) can be used to get an unbiased estimate of the model's performance.\n",
+ "\n",
+ "6. **Parallelizable**: The process of building trees is easily parallelizable as each tree is built independently of the others.\n",
+ "\n",
+ "7. **Missing Values Handling**: Random Forest can handle missing values. When the dataset has missing values, the Random Forest algorithm will learn the best impute value for the missing values based on the reduction in the impurity.\n",
+ "\n",
+ "8. **Non-Parametric**: Random Forest is a non-parametric method, which means that it makes no assumptions about the functional form of the transformation from inputs to output. This is an advantage for datasets where the relationship between inputs and output is complex and non-linear.\n",
+ "\n",
+ "Remember, while Random Forest has these advantages, it also has some disadvantages like being a black box model with limited interpretability compared to a single decision tree, and being slower to train and predict than simpler models like linear models."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Difference Between Decision Tree and Random Forest?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here's a comparison between Decision Trees and Random Forests in a tabular format:\n",
+ "\n",
+ "| Feature | Decision Tree | Random Forest |\n",
+ "| --- | --- | --- |\n",
+ "| **Basic** | Single tree | Ensemble of multiple trees |\n",
+ "| **Overfitting** | Prone to overfitting | Less prone due to averaging of multiple trees |\n",
+ "| **Performance** | Lower performance on complex datasets | Higher performance due to ensemble method |\n",
+ "| **Training Speed** | Faster | Slower due to building multiple trees |\n",
+ "| **Prediction Speed** | Faster | Slower due to aggregating results from multiple trees |\n",
+ "| **Interpretability** | High (easy to visualize and understand) | Lower (hard to visualize many trees) |\n",
+ "| **Feature Selection** | Uses all features for splitting a node | Randomly selects a subset of features for splitting a node |\n",
+ "| **Handling Unseen Data** | Less effective | More effective due to averaging |\n",
+ "| **Variance** | High variance | Low variance due to averaging |\n",
+ "\n",
+ "Remember, the choice between a Decision Tree and Random Forest often depends on the specific problem and the computational resources available. Random Forests generally perform better, but they require more computational resources and are less interpretable."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Important Hyperparameters in Random Forest?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Random Forest has several important hyperparameters that control its behavior:\n",
+ "\n",
+ "1. **n_estimators**: This is the number of trees you want to build before taking the maximum voting or averages of predictions. Higher values make the predictions stronger and more stable, but also slow down the computation.\n",
+ "\n",
+ "2. **max_features**: These are the maximum number of features Random Forest is allowed to try in individual tree. There are multiple options available such as \"auto\", \"sqrt\", \"log2\", or an integer. Typically, sqrt(number of features) is a good starting point.\n",
+ "\n",
+ "3. **max_depth**: This is the maximum number of levels in each decision tree. You can set it to an integer or leave it as None for unlimited depth. This can be used to control overfitting.\n",
+ "\n",
+ "4. **min_samples_split**: This is the minimum number of data points placed in a node before the node is split. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.\n",
+ "\n",
+ "5. **min_samples_leaf**: This is the minimum number of data points allowed in a leaf node. Higher values reduce overfitting.\n",
+ "\n",
+ "6. **bootstrap**: This is a boolean value indicating whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.\n",
+ "\n",
+ "7. **oob_score**: Also a boolean, it indicates whether to use out-of-bag samples to estimate the generalization accuracy.\n",
+ "\n",
+ "8. **n_jobs**: This indicates the number of jobs to run in parallel for both fit and predict. If set to -1, then the number of jobs is set to the number of cores.\n",
+ "\n",
+ "9. **random_state**: This controls the randomness of the bootstrapping of the samples used when building trees. If the random state is fixed, the model output will be deterministic.\n",
+ "\n",
+ "10. **class_weight**: This parameter allows you to specify weights for the classes. This is useful if your classes are imbalanced.\n",
+ "\n",
+ "Remember, tuning these hyperparameters can significantly improve the performance of the model, but it can also lead to overfitting if not done carefully. It's usually a good idea to use some form of cross-validation, such as GridSearchCV or RandomizedSearchCV, to find the optimal values for these hyperparameters."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Coding in Python – Random Forest?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here's a basic example of how to use the Random Forest algorithm for a classification problem in Python using the sklearn library. We'll use the iris dataset, which is a multi-class classification problem, built into sklearn for this example.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accuracy: 1.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.ensemble import RandomForestClassifier\n",
+ "from sklearn.datasets import load_iris\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn.metrics import accuracy_score\n",
+ "\n",
+ "# Load the iris dataset\n",
+ "iris = load_iris()\n",
+ "X = iris.data\n",
+ "y = iris.target\n",
+ "\n",
+ "# Split the data into train and test sets\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n",
+ "\n",
+ "# Create the model with 100 trees\n",
+ "model = RandomForestClassifier(n_estimators=100, \n",
+ " bootstrap = True,\n",
+ " max_features = 'sqrt')\n",
+ "\n",
+ "# Fit on training data\n",
+ "model.fit(X_train, y_train)\n",
+ "\n",
+ "# Predict the test set\n",
+ "predictions = model.predict(X_test)\n",
+ "\n",
+ "# Calculate the accuracy score\n",
+ "accuracy = accuracy_score(y_test, predictions)\n",
+ "print(\"Accuracy: \", accuracy)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "This code first loads the iris dataset, then splits it into a training set and a test set. A Random Forest model is created with 100 trees and fitted on the training data. The model is then used to predict the classes of the test set, and the accuracy of these predictions is printed out."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Coding in R – Random Forest?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here's a basic example of how to use the Random Forest algorithm for a classification problem in R using the randomForest package. We'll use the iris dataset, which is a multi-class classification problem, built into R for this example.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Installing package into ‘/home/blackheart/R/x86_64-pc-linux-gnu-library/4.1’\n",
+ "(as ‘lib’ is unspecified)\n",
+ "\n",
+ "randomForest 4.7-1.1\n",
+ "\n",
+ "Type rfNews() to see new features/changes/bug fixes.\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Call:\n",
+ " randomForest(formula = Species ~ ., data = iris, ntree = 100, importance = TRUE) \n",
+ " Type of random forest: classification\n",
+ " Number of trees: 100\n",
+ "No. of variables tried at each split: 2\n",
+ "\n",
+ " OOB estimate of error rate: 4.67%\n",
+ "Confusion matrix:\n",
+ " setosa versicolor virginica class.error\n",
+ "setosa 50 0 0 0.00\n",
+ "versicolor 0 47 3 0.06\n",
+ "virginica 0 4 46 0.08\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "A matrix: 4 × 5 of type dbl\n",
+ "\n",
+ "\t | setosa | versicolor | virginica | MeanDecreaseAccuracy | MeanDecreaseGini |
\n",
+ "\n",
+ "\n",
+ "\tSepal.Length | 2.313169 | 1.9048166 | 3.275294 | 4.299426 | 9.541714 |
\n",
+ "\tSepal.Width | 1.931173 | -0.6401191 | 2.173373 | 1.834349 | 2.033635 |
\n",
+ "\tPetal.Length | 10.176809 | 16.8581929 | 13.539843 | 16.298141 | 46.032251 |
\n",
+ "\tPetal.Width | 10.139444 | 13.1397106 | 14.113327 | 14.885764 | 41.657799 |
\n",
+ "\n",
+ "
\n"
+ ],
+ "text/latex": [
+ "A matrix: 4 × 5 of type dbl\n",
+ "\\begin{tabular}{r|lllll}\n",
+ " & setosa & versicolor & virginica & MeanDecreaseAccuracy & MeanDecreaseGini\\\\\n",
+ "\\hline\n",
+ "\tSepal.Length & 2.313169 & 1.9048166 & 3.275294 & 4.299426 & 9.541714\\\\\n",
+ "\tSepal.Width & 1.931173 & -0.6401191 & 2.173373 & 1.834349 & 2.033635\\\\\n",
+ "\tPetal.Length & 10.176809 & 16.8581929 & 13.539843 & 16.298141 & 46.032251\\\\\n",
+ "\tPetal.Width & 10.139444 & 13.1397106 & 14.113327 & 14.885764 & 41.657799\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A matrix: 4 × 5 of type dbl\n",
+ "\n",
+ "| | setosa | versicolor | virginica | MeanDecreaseAccuracy | MeanDecreaseGini |\n",
+ "|---|---|---|---|---|---|\n",
+ "| Sepal.Length | 2.313169 | 1.9048166 | 3.275294 | 4.299426 | 9.541714 |\n",
+ "| Sepal.Width | 1.931173 | -0.6401191 | 2.173373 | 1.834349 | 2.033635 |\n",
+ "| Petal.Length | 10.176809 | 16.8581929 | 13.539843 | 16.298141 | 46.032251 |\n",
+ "| Petal.Width | 10.139444 | 13.1397106 | 14.113327 | 14.885764 | 41.657799 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " setosa versicolor virginica MeanDecreaseAccuracy\n",
+ "Sepal.Length 2.313169 1.9048166 3.275294 4.299426 \n",
+ "Sepal.Width 1.931173 -0.6401191 2.173373 1.834349 \n",
+ "Petal.Length 10.176809 16.8581929 13.539843 16.298141 \n",
+ "Petal.Width 10.139444 13.1397106 14.113327 14.885764 \n",
+ " MeanDecreaseGini\n",
+ "Sepal.Length 9.541714 \n",
+ "Sepal.Width 2.033635 \n",
+ "Petal.Length 46.032251 \n",
+ "Petal.Width 41.657799 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[1] \"Accuracy: 1\"\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Install and load the randomForest package\n",
+ "install.packages(\"randomForest\")\n",
+ "library(randomForest)\n",
+ "\n",
+ "# Load the iris dataset\n",
+ "data(iris)\n",
+ "\n",
+ "# Create a random forest model\n",
+ "set.seed(42) # for reproducibility\n",
+ "iris.rf <- randomForest(Species ~ ., data=iris, ntree=100, importance=TRUE)\n",
+ "\n",
+ "# Print the model summary\n",
+ "print(iris.rf)\n",
+ "\n",
+ "# Get importance of each feature\n",
+ "importance(iris.rf)\n",
+ "\n",
+ "# Predict using the model\n",
+ "iris.pred <- predict(iris.rf, iris)\n",
+ "\n",
+ "# Check the accuracy\n",
+ "accuracy <- sum(iris.pred == iris$Species) / nrow(iris)\n",
+ "print(paste(\"Accuracy: \", accuracy))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "This code first loads the iris dataset, then a Random Forest model is created with 100 trees and fitted on the entire dataset. The model summary and feature importance are printed out. The model is then used to predict the classes of the same dataset (this is just for demonstration, in practice you should split your data into training and testing sets), and the accuracy of these predictions is printed out."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# **Thank You!**"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/04_Machine-Learning/05_Supervised_Learning_Algorithms/06. Naive Bayes Classifier.ipynb b/04_Machine-Learning/05_Supervised_Learning_Algorithms/06. Naive Bayes Classifier.ipynb
new file mode 100644
index 0000000..d03d56e
--- /dev/null
+++ b/04_Machine-Learning/05_Supervised_Learning_Algorithms/06. Naive Bayes Classifier.ipynb
@@ -0,0 +1,530 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Naive Bayes Classifier"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Title: Understanding the Naive Bayes Classifier\n",
+ "\n",
+ "---\n",
+ "\n",
+ "#### **Introduction:**\n",
+ "\n",
+ "In the realm of machine learning and data science, the Naive Bayes classifier holds a pivotal role due to its simplicity, efficiency, and surprising power, especially in the field of text analysis. Named after the famous mathematician Thomas Bayes, the Naive Bayes classifier is a probabilistic machine learning model used for classification tasks.\n",
+ "\n",
+ "The Naive Bayes classifier is based on Bayes' theorem, which is an equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian classification, we're interested in finding the probability of a label given some observed features, which we can express as P(L | features). Bayes' theorem tells us how to express this in terms of quantities we can compute more directly.\n",
+ "\n",
+ "A key assumption made by the Naive Bayes classifier, which is actually a simplification, is that the features are conditionally independent given the class. This means that the presence of a particular feature in a class is unrelated to the presence of any other feature in the same class. This is the 'naive' part of the 'Naive Bayes' - it's a naive assumption because it's not often encountered in real-world data, yet the classifier can perform surprisingly well even when this assumption doesn't hold.\n",
+ "\n",
+ "Naive Bayes classifiers are highly scalable and well-suited to high-dimensional datasets. They are often used for text classification, spam filtering, sentiment analysis, and recommendation systems, among other applications.\n",
+ "\n",
+ "In this article, we will delve deeper into the workings of the Naive Bayes classifier, explore its mathematical foundation, discuss its strengths and weaknesses, and see it in action through Python code examples.\n",
+ "\n",
+ "---\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What is Naive Bayes Classifier?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The Naive Bayes Classifier is a type of probabilistic machine learning model used for classification tasks. The classifier is based on applying Bayes' theorem with strong (naive) independence assumptions between the features.\n",
+ "\n",
+ "In simple terms, a Naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable. For example, a fruit may be considered an apple if it is red, round, and about 3 inches in diameter. A Naive Bayes classifier considers each of these features (red, round, 3 inches in diameter) to contribute independently to the probability that the fruit is an apple, regardless of any possible correlations between color, roundness, and diameter.\n",
+ "\n",
+ "Naive Bayes classifiers are highly scalable and are known to outperform even highly sophisticated classification methods. They are particularly well suited for high-dimensional data sets and are commonly used in text categorization (spam or not spam), sentiment analysis, and document classification. Despite their naive design and oversimplified assumptions, Naive Bayes classifiers often perform very well in many complex real-world situations.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What Is the Naive Bayes Algorithm?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The Naive Bayes algorithm is a classification technique based on applying Bayes' theorem with a strong assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.\n",
+ "\n",
+ "Here's a step-by-step breakdown of how the Naive Bayes algorithm works:\n",
+ "\n",
+ "1. **Convert the data set into a frequency table**\n",
+ "2. **Create a likelihood table by finding the probabilities of given observations**\n",
+ "3. **Now, use Bayes theorem to calculate the posterior probability.**\n",
+ "\n",
+ "The core idea is that the predictors contribute independently to the probability of the class. This independent contribution is where the term 'naive' comes from.\n",
+ "\n",
+ "The Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. It is widely used in text analytics, spam filtering, recommendation systems, and more.\n",
+ "\n",
+ "Despite its simplicity, the Naive Bayes algorithm often performs well and is widely used because it often outputs a classification model that classifies correctly even when its assumption of the independence of features is violated."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Realife Example of Naive Bayes Algorithm\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here are a few real-life examples of where the Naive Bayes algorithm is commonly used:\n",
+ "\n",
+ "1. **Email Spam Filtering**: Naive Bayes is a popular algorithm for email spam filtering. It classifies emails as 'spam' or 'not spam' by examining the frequency of certain words and phrases. For example, emails containing the words 'lottery', 'win', or 'claim' might be classified as spam.\n",
+ "\n",
+ "2. **Sentiment Analysis**: Naive Bayes is often used in sentiment analysis to determine whether a given piece of text (like a product review or a tweet) is positive, negative, or neutral. It does this by looking at the words used in the text and their associated sentiments.\n",
+ "\n",
+ "3. **Document Categorization**: Naive Bayes can be used to categorize documents into different categories based on their content. For example, news articles might be categorized into 'sports', 'politics', 'entertainment', etc.\n",
+ "\n",
+ "4. **Healthcare**: In healthcare, Naive Bayes can be used to predict the likelihood of a patient having a particular disease based on their symptoms.\n",
+ "\n",
+ "5. **Recommendation Systems**: Naive Bayes can be used in recommendation systems to predict a user's interests and recommend products or services. For example, if a user often watches action movies, the system might recommend other action movies for them to watch.\n",
+ "\n",
+ "Remember, the 'naive' assumption of Naive Bayes (that all features are independent given the class) is often violated in real-world data, yet the algorithm often performs surprisingly well."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Bayes' Theorem"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Bayes' theorem is a fundamental principle in the field of statistics and probability. It describes the relationship of conditional probabilities of statistical quantities. In other words, it gives us a way to update our previous beliefs based on new evidence.\n",
+ "\n",
+ "The theorem is named after Thomas Bayes, who first provided an equation that allows new evidence to update beliefs in his \"An Essay towards solving a Problem in the Doctrine of Chances\" (1763). It's articulated as:\n",
+ "\n",
+ "P(A|B) = [P(B|A) * P(A)] / P(B)\n",
+ "\n",
+ "Where:\n",
+ "\n",
+ "- P(A|B) is the posterior probability of A given B. It's what we are trying to estimate.\n",
+ "- P(B|A) is the likelihood, the probability of observing B given A.\n",
+ "- P(A) is the prior probability of A. It's our belief about A before observing B.\n",
+ "- P(B) is the marginal likelihood of B.\n",
+ "\n",
+ "In the context of a classification problem, we can understand it as follows:\n",
+ "\n",
+ "- A is the event that a given data point belongs to a certain class.\n",
+ "- B is the observed features of the data point.\n",
+ "- P(A|B) is then the probability that the data point belongs to that class given its features.\n",
+ "\n",
+ "Bayes' theorem thus provides a way to calculate the probability of a data point belonging to a certain class based on its features, which is the fundamental idea behind the Naive Bayes classifier."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What are the steps involved in training a Naive Bayes classifier?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Training a Naive Bayes classifier involves several steps:\n",
+ "\n",
+ "1. **Data Preprocessing**: The first step in training a Naive Bayes classifier is to preprocess the data. This may involve cleaning the data, handling missing values, encoding categorical variables, and normalizing numerical variables.\n",
+ "\n",
+ "2. **Feature Extraction**: Depending on the type of data, you might need to extract features that the classifier can use. For example, if you're classifying text documents, you might need to convert the documents into a bag-of-words representation or a TF-IDF representation.\n",
+ "\n",
+ "3. **Train-Test Split**: Split your dataset into a training set and a test set. The training set is used to train the model, and the test set is used to evaluate its performance.\n",
+ "\n",
+ "4. **Model Training**: Train the Naive Bayes classifier on the training data. This involves calculating the prior probabilities (the probabilities of each class) and the likelihoods (the probabilities of each feature given each class).\n",
+ "\n",
+ "5. **Prediction**: Use the trained model to make predictions on unseen data. For each data point, the model calculates the posterior probability of each class given the features of the data point, and assigns the data point to the class with the highest posterior probability.\n",
+ "\n",
+ "6. **Evaluation**: Evaluate the performance of the model on the test set. Common evaluation metrics for classification tasks include accuracy, precision, recall, and the F1 score.\n",
+ "\n",
+ "7. **Model Tuning**: Depending on the performance of the model, you might need to go back and adjust the model's parameters, choose different features, or preprocess the data in a different way. This is an iterative process.\n",
+ "\n",
+ "Remember, the 'naive' assumption of Naive Bayes (that all features are independent given the class) is often violated in real-world data, yet the algorithm often performs surprisingly well."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# How Do Naive Bayes Algorithms Work?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Naive Bayes algorithms are based on Bayes' theorem, which is a way of finding a probability when we know certain other probabilities. The 'naive' part comes from the assumption that each input (feature) is independent of the others.\n",
+ "\n",
+ "Let's consider a simple example related to weather conditions. Suppose we are trying to predict whether a person will go for a walk based on the weather conditions. We have the following data:\n",
+ "\n",
+ "- 60% of the days are sunny.\n",
+ "- The person goes for a walk on 70% of the sunny days.\n",
+ "- The person goes for a walk on 40% of the days.\n",
+ "\n",
+ "We want to find out the probability that it is sunny given that the person went for a walk. This is written as P(Sunny|Walk).\n",
+ "\n",
+ "Using Bayes' theorem:\n",
+ "\n",
+ "P(Sunny|Walk) = [P(Walk|Sunny) * P(Sunny)] / P(Walk)\n",
+ "\n",
+ "We can substitute the known values into this equation:\n",
+ "\n",
+ "P(Sunny|Walk) = [(0.7) * (0.6)] / (0.4) = 1.05\n",
+ "\n",
+ "However, a probability cannot be greater than 1. This discrepancy arises because the actual probability of the person going for a walk (P(Walk)) is not independent of the weather conditions. The correct value of P(Walk) should be the total probability of the person going for a walk under all weather conditions, which is calculated as follows:\n",
+ "\n",
+ "P(Walk) = P(Walk and Sunny) + P(Walk and not Sunny)\n",
+ " = P(Walk|Sunny) * P(Sunny) + P(Walk|not Sunny) * P(not Sunny)\n",
+ " = (0.7 * 0.6) + (0.3 * 0.4) = 0.42 + 0.12 = 0.54\n",
+ "\n",
+ "Substituting the correct value of P(Walk) into Bayes' theorem gives:\n",
+ "\n",
+ "P(Sunny|Walk) = [(0.7) * (0.6)] / (0.54) = 0.777...\n",
+ "\n",
+ "So, the updated probability that it is sunny given that the person went for a walk is approximately 0.78, or 78%.\n",
+ "\n",
+ "This example demonstrates how Naive Bayes updates our beliefs based on evidence (in this case, the fact that the person went for a walk). However, it also shows why the 'naive' assumption can lead to errors if the input features are not truly independent."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Python Code Implementation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here's an example of how you might use the Naive Bayes algorithm to classify emails as spam or not spam. This example uses the `sklearn` library in Python.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " not spam 0.00 0.00 0.00 0.0\n",
+ " spam 0.00 0.00 0.00 1.0\n",
+ "\n",
+ " accuracy 0.00 1.0\n",
+ " macro avg 0.00 0.00 0.00 1.0\n",
+ "weighted avg 0.00 0.00 0.00 1.0\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn.feature_extraction.text import CountVectorizer\n",
+ "from sklearn.naive_bayes import MultinomialNB\n",
+ "from sklearn import metrics\n",
+ "from warnings import filterwarnings\n",
+ "\n",
+ "filterwarnings('ignore')\n",
+ "\n",
+ "# Sample data\n",
+ "emails = ['Hey, can we meet tomorrow?', 'Upto 20% discount, exclusive offer just for you', \n",
+ " 'Are you available tomorrow?', 'Win a lottery of $1 Million']\n",
+ "labels = ['not spam', 'spam', 'not spam', 'spam']\n",
+ "\n",
+ "# Convert emails to word count vectors\n",
+ "vectorizer = CountVectorizer()\n",
+ "email_vec = vectorizer.fit_transform(emails)\n",
+ "\n",
+ "# Split data into training and test sets\n",
+ "X_train, X_test, y_train, y_test = train_test_split(email_vec, labels, test_size=0.2, random_state=1)\n",
+ "\n",
+ "# Train a Naive Bayes classifier\n",
+ "nb = MultinomialNB()\n",
+ "nb.fit(X_train, y_train)\n",
+ "\n",
+ "# Make predictions on the test set\n",
+ "predictions = nb.predict(X_test)\n",
+ "\n",
+ "# Evaluate the model\n",
+ "print(metrics.classification_report(y_test, predictions))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "In this example, we first convert the emails into word count vectors using `CountVectorizer`. This transforms the text such that each email becomes a vector in a high-dimensional space, where each dimension corresponds to a unique word in all the emails.\n",
+ "\n",
+ "We then split the data into a training set and a test set using `train_test_split`.\n",
+ "\n",
+ "Next, we create a `MultinomialNB` object, which is a Naive Bayes classifier for multinomial models. We train this classifier on the training data using the `fit` method.\n",
+ "\n",
+ "Finally, we use the trained classifier to make predictions on the test set, and we evaluate the performance of the classifier using `classification_report`, which prints precision, recall, F1-score, and support for each class."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# R Code Implementation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here's an example of how you might use the Naive Bayes classifier in R using the `e1071` package. This example uses the built-in `iris` dataset.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Installing package into ‘/home/blackheart/R/x86_64-pc-linux-gnu-library/4.1’\n",
+ "(as ‘lib’ is unspecified)\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " true\n",
+ "pred setosa versicolor virginica\n",
+ " setosa 14 0 0\n",
+ " versicolor 0 18 0\n",
+ " virginica 0 0 13\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Load the necessary library\n",
+ "install.packages(\"e1071\")\n",
+ "library(e1071)\n",
+ "\n",
+ "# Load the iris dataset\n",
+ "data(iris)\n",
+ "\n",
+ "# Split the data into training and test sets\n",
+ "set.seed(123)\n",
+ "train_indices <- sample(1:nrow(iris), nrow(iris)*0.7)\n",
+ "train_data <- iris[train_indices, ]\n",
+ "test_data <- iris[-train_indices, ]\n",
+ "\n",
+ "# Train a Naive Bayes classifier\n",
+ "model <- naiveBayes(Species ~ ., data = train_data)\n",
+ "\n",
+ "# Make predictions on the test set\n",
+ "predictions <- predict(model, test_data)\n",
+ "\n",
+ "# Print out the confusion matrix to see how well the model did\n",
+ "print(table(pred = predictions, true = test_data$Species))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "In this example, we first load the `e1071` library, which provides the `naiveBayes` function. We then load the `iris` dataset and split it into a training set and a test set.\n",
+ "\n",
+ "Next, we train a Naive Bayes classifier on the training data using the `naiveBayes` function. The formula `Species ~ .` tells the function to use `Species` as the dependent variable and all other variables in the data frame as independent variables.\n",
+ "\n",
+ "Finally, we use the trained classifier to make predictions on the test set, and we print out a confusion matrix to see how well the model did. The confusion matrix shows the number of correct and incorrect predictions made by the classifier, broken down by each class."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What are the different types of Naive Bayes classifiers?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "There are several types of Naive Bayes classifiers, each suited to a different type of input data:\n",
+ "\n",
+ "1. **Gaussian Naive Bayes**: This is the most common type. It assumes that the data for each class is distributed according to a Gaussian (normal) distribution. It's often used when the features are continuous.\n",
+ "\n",
+ "2. **Multinomial Naive Bayes**: This is used when the data are discrete counts, such as the number of times a particular word appears in a document. It's often used in text classification problems.\n",
+ "\n",
+ "3. **Bernoulli Naive Bayes**: This is used when the features are binary (0/1). It's also often used in text classification, where the features might be whether or not a particular word appears in a document.\n",
+ "\n",
+ "4. **Complement Naive Bayes**: This is a variation of Multinomial Naive Bayes that is particularly suited for imbalanced data sets. Instead of modeling the data with respect to each class, it models the data with respect to all classes that are not in the current class.\n",
+ "\n",
+ "5. **Categorical Naive Bayes**: This is used for categorical data. Each feature is assumed to be a categorical variable.\n",
+ "\n",
+ "Each of these types makes a different assumption about the distribution of the data, and the best one to use depends on the specific problem and data set."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What are the advantages and disadvantages of using the Naive Bayes classifier?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Advantages of Naive Bayes Classifier:\n",
+ "\n",
+ "1. **Efficiency**: Naive Bayes requires a small amount of training data to estimate the necessary parameters. It's fast and easy to predict the class of the test dataset.\n",
+ "\n",
+ "2. **Easy to implement**: Naive Bayes is simple to understand and easy to implement. It's a good choice as a baseline model to compare with more complex models.\n",
+ "\n",
+ "3. **Works well with high dimensions**: Naive Bayes performs well when dealing with many input variables. It's often used for text classification where the number of input variables (words) can be large.\n",
+ "\n",
+ "4. **Handles continuous and discrete data**: Naive Bayes can handle both continuous and discrete data. Different types of Naive Bayes classifiers can be used depending on the distribution of the data (Gaussian, Multinomial, Bernoulli).\n",
+ "\n",
+ "Disadvantages of Naive Bayes Classifier:\n",
+ "\n",
+ "1. **Assumption of independent predictors**: Naive Bayes assumes that all features are independent. In real life, it's almost impossible that we get a set of predictors which are completely independent.\n",
+ "\n",
+ "2. **Zero Frequency**: If the category of any categorical variable is not observed in training data set, then the model will assign a zero probability to that category and then a prediction cannot be made. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.\n",
+ "\n",
+ "3. **Bad estimator**: While Naive Bayes is a good classifier, it is known to be a bad estimator. So the probability outputs from `predict_proba` are not to be taken too seriously.\n",
+ "\n",
+ "4. **Data scarcity**: For data with a categorical variable, the estimation of probabilities can be a problem if a category has not been observed in the training data set."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# What are some common applications of the Naive Bayes classifier in real-world scenarios?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Naive Bayes classifiers have a wide range of applications due to their simplicity, efficiency, and relatively high accuracy. Here are some common real-world applications:\n",
+ "\n",
+ "1. **Email Spam Filtering**: Naive Bayes is one of the most popular algorithms for spam filtering. It classifies emails as 'spam' or 'not spam' by examining the frequency of certain words and phrases.\n",
+ "\n",
+ "2. **Sentiment Analysis**: Naive Bayes is often used in sentiment analysis to determine whether a given piece of text (like a product review or a tweet) is positive, negative, or neutral. It does this by looking at the words used in the text and their associated sentiments.\n",
+ "\n",
+ "3. **Document Categorization**: Naive Bayes can be used to categorize documents into different categories based on their content. For example, news articles might be categorized into 'sports', 'politics', 'entertainment', etc.\n",
+ "\n",
+ "4. **Healthcare**: In healthcare, Naive Bayes can be used to predict the likelihood of a patient having a particular disease based on their symptoms.\n",
+ "\n",
+ "5. **Recommendation Systems**: Naive Bayes can be used in recommendation systems to predict a user's interests and recommend products or services. For example, if a user often watches action movies, the system might recommend other action movies for them to watch.\n",
+ "\n",
+ "6. **Text Classification**: Naive Bayes is widely used in text classification, where the data are typically represented as word vector counts (although tf-idf vectors are also commonly used in text classification).\n",
+ "\n",
+ "7. **Fraud Detection**: In finance, Naive Bayes is used for credit scoring, predicting loan defaults, and fraud detection in transactions."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tips to Improve the Power of the NB Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here are some tips to improve the performance of a Naive Bayes model:\n",
+ "\n",
+ "1. **Feature Selection**: Naive Bayes assumes that all features are independent. If some features are dependent on each other, the prediction might be incorrect. So, it's important to select only the relevant features.\n",
+ "\n",
+ "2. **Avoid Zero Frequency**: If a given class and feature value never occur together in the training data, then the frequency-based probability estimate will be zero. To solve this, we can use a smoothing technique. One of the simplest smoothing techniques is called Laplace estimation.\n",
+ "\n",
+ "3. **Tune the Model**: Use grid search or random search to find the optimal parameters for the Naive Bayes model. For example, in the case of the Gaussian Naive Bayes, you can adjust the `var_smoothing` parameter.\n",
+ "\n",
+ "4. **Preprocess Your Data**: Techniques such as removing outliers, filling missing values, and scaling can help improve the performance of a Naive Bayes model.\n",
+ "\n",
+ "5. **Use the Right Variant of Naive Bayes**: Different variants of Naive Bayes (like Gaussian, Multinomial, Bernoulli) are suitable for different types of data. Choose the one that's most appropriate for your data.\n",
+ "\n",
+ "6. **Ensemble Methods**: Combining the predictions of multiple different models can often result in better performance than any single model. You could consider using a Naive Bayes model as part of an ensemble.\n",
+ "\n",
+ "7. **Update Your Model**: Naive Bayes allows for incremental learning. As new data comes in, you can update your model's probabilities without having to retrain it from scratch.\n",
+ "\n",
+ "Remember, while Naive Bayes is a powerful tool, it's not always the best choice for every problem. Depending on the complexity and nature of your data, other models may yield better results."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Reference\n",
+ "\n",
+ "1. [analyticsvidya.com](https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# **Thank You!**"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "4.1.2"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}