GitHub - sonersteiner/Machine-Learning-using-oneAPI: Machine Learning using oneAPI. Explores Intel Extensions for scikit-learn* and NumPy, SciPy, Pandas powered by oneAPI

Title

Machine Learning using oneAPI

Preparation to run on Intel DevCloud

After the git clone operation:

mkdir MLoneAPI
cd MLoneAPI
source /glob/development-tools/versions/oneapi/2022.2/oneapi/setvars.sh --force
conda activate base
git clone https://github.com/IntelSoftware/Machine-Learning-using-oneAPI.git
cd  Machine-Learning-using-oneAPI
pip install -r requirements.txt

Purpose

The Jupyter Notebooks in this training are intended to give instructors an accesible but challenging introduction to machine learning using oneAPI. It enumerates and describes many commonly used Scikit-learn* allgorithms which are used daily to address machine learning challenges. The primary purpose is to accelerate commonly used Scikit-learn algorithms for Intel CPUs and GPU's using Intel Extensions for Scikit-learn* which is part of the Intel AI Analytics Toolkit powered by oneAPI.

This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.

License

Code samples are licensed under the MIT license. See License.txt for details. Third party program Licenses can be found here: third-party-programs.txt

Content Details

Pre-requisites

Python* Programming
Calculus
Linear algebra
Statistics

Syllabus

8 Modules (8 hours)
15 Lab Exercises

Folder	Modules	Description	Duration
01_Intel_Extensions_for_Scikit-learn_Patching_CPU	01_01_Intel_Extensions_for_Scikit-learn_Patching_CPU	+ Describe the basics of oneAPI AI Kit components, and where the Intel(R) Extensions for scikit-learn fits in the broader package. + Describe where to download and how to install the oneAPI AI Kit. + Describe the advantages of one specific component of oneAPI AI Kit, Intel(R) Extensions for scikit-learn, invoked via the sklearnex library. + Apply the patch and unpatch functions with varying granularities including python scripts and also within Jupyter cells: from whole file applications to more surgical patches applied to a single algorithm. + Enumerate sklearn algorithms which have been optimized.	20 min
01_Intel_Extensions_for_Scikit-learn_Patching_CPU	01_02_Coarse_Patching_Instructions	+ Describe how to import and apply patch_sklearn(). + Describe how to import and apply unpatch_sklearn(). + Describe method & apply the patch to an entire python program. + Describe how to surgically unpatch specific optimized functions if needed. + Describe a patching strategy that ensures that the Intel Extensions for scikit-learn runs as fast or faster than the stock algorithms it replaces. + Apply patch methodology to speed up KNN on CovType dataset	20 min
02_Applied_Patching_CPU	02_01_Pairwise_DistanceVectorizedStockSImulationReadPortfolio	+ Describe and apply the correct surgical patching method to patch pairwise_distance. + Describe which distance metrics such as 'euclidean', 'mahattan', 'cosine', or 'correlation' are optimized by Intel Extensions for Scikit learn. + Describe the application of pairwise_distance to the problem of finding all time series charts similar to a chosen pattern.	20 min
02_Applied_Patching_CPU	02_02_PatchingKNN_CPU	+ Describe how to surgically unpatch specific optimized functions if needed. + Apply patching to KNN algorithm. + Describe acceleration for the covtype dataset with KNN classification	20 min
02_Applied_Patching_CPU	02_03_Patching_Kmeans_CPU	+ Describe the value of Intel® Extension for Scikit-learn methodology in extending scikit-learn optimization capabilites. + Name key imports and function calls to use Intel Extension for Scikit-learn to target Kmeans. + Build a Sklearn implementation of Kmeans targeting CPU using patching. + Apply patching with dynamic versus lexical scope approaches.	20 min
02_Applied_Patching_CPU	02_04_PatchingSVM_CPU	+ Describe how to surgically unpatch specific optimized functions if needed. + Describe differences in patching more globally versus more surgically. + Apply patching to SVC algorithm. + Describe acceleration for the covtype dataset usinf SVC.	20 min
03_Applied_to_Image_Clustering_CPU	03_01_Practicum_ImageClustering	+ Explore and interpret the image dataset. + Apply Intel® Extension for Scikit-learn* patches to Principal Components Analysis (PCA), Kmeans,and DBSCAN algorithms. + Synthesize your understanding- searching for ways to patch or unpatch any applicable cells to maximize the performance of each cell.	60 min
04_Applied_to_Galaxy_Classification_CPU	04_01_Practicum_AnalyzeGalaxyBatch	+ Apply Multiple Classification Algorithms with GPU to classify stars belonging to each galaxy within a combined super galaxy to determine most accurate model. + Apply Intel® Extension for Scikit-learn* patch and SYCL context to compute on available GPU resource. Synthesize your compreshension by searching for opportunities in each cell to maximize performance. Investigate adding pairwise distance as a means for all the stars within 3 light years.	60 min
05_Introduction_dpctl_for_GPU	05_01_Introduction_simple_gallery_dpctl_for_GPU	+ Apply patching while targeting an Intel GPU. + Apply Intel Extension for Scikit-learn to KNeighborsClassifier on Intel GPU.	30 min
05_Introduction_dpctl_for_GPU	05_02_PatchingKNN_GPU	+ Describe how to apply dpctl compute follows data in conjuction with patching. + Apply patching to KNN algorithm on covtype dataset.	20 min
05_Introduction_dpctl_for_GPU	05_03_Gallery_of_Functions_on_GPU	+ Apply the patch functions with varying granularities. + Leverage the Compute Follows Data methodology using Intel DPCTL library to target Intel GPU. + Apply DPCTL and Patching to variety of Scikit-learn Algorithsm in a simple test harness structure. + For the current hardware configurationson the Intel DevCloud - we are NOT focusing on performance.	30 min
06_Applied_to_Image_Clustering_GPU	06_01_Practicum_ImageClustering	+ Explore and interpret the image dataset. + Apply Intel® Extension for Scikit-learn* patches to Principal Components Analysis (PCA), Kmeans,and DBSCAN algorithms. + Synthesize your understanding- searching for ways to patch or unpatch any applicable cells to maximize the performance of each cell. + Apply a q.sh script to submit a job to another node that has a GPU on Intel DevCloud.	60 min
07_Applied_to_Galaxy_Classification_GPU	07_01_Practicum_AnalyzeGalaxyBatch	+ Apply Multiple Classification Algorithms with GPU to classify stars belonging to each galaxy within a combined super galaxy to determine most accurate model. + Apply Intel® Extension for Scikit-learn* patch and SYCL context to compute on available GPU resource. + Synthesize your compreshension by searching for opportunities in each cell to maximize performance.	60 min
08_Introduction_to_Numpy_powered_by_oneAPI	08_01_Numpy_How_Fast_Are_Numpy_Ops	+ Desribe why replacing inefficient code, such as time consuming loops, wastes resources, and time. + Describe why using Python for highly repetitive small tasks is inefficient. + Describe the additive value of leveraging packages such as Numpy which are powered by oneAPI in a cloud world. + Describe the importance of keeping oneAPI and 3rd party package such as Numpy, Scipy and others is important. + Enumerate ways in which Numpy accelerates code. + Apply loop replacement methodologies in a variety of scenarios.	60 min
08_Introduction_to_Numpy_powered_by_oneAPI	08_02_PandasPoweredBy_oneAPI	+ Apply Numpy methods to dramatically speed up certain common Pandas bottlenecks. + Apply WHERE or SELECT in Numpy powered by oneAPI. + Avoid iterrows using Numpy techniques. + Achieve better performacne by converting numerical columns to numpy arrays.	20 min

Content Structure

Each module folder has a Jupyter Notebook file (*.ipynb), this can be opened in Jupyter Lab to view the training contant, edit code and compile/run.

Install Directions

The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation.

Local Installation of JupyterLab and oneAPI Tools

The Jupyter Notebooks can be downloaded locally to computer and accessed:

Install Jupyter Lab on local computer: Installation Guide
Install Intel oneAPI Base Toolkit on local computer: Installation Guide
git clone the repo and access the Notebooks using Jupyter Lab

Access using Intel DevCloud

The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:

Register on Intel DevCloud
Login, Get Started and Launch Jupyter Lab
Open Terminal in Jupyter Lab and git clone the repo and access the Notebooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Title

Preparation to run on Intel DevCloud

Purpose

License

Content Details

Pre-requisites

Syllabus

Content Structure

Install Directions

Local Installation of JupyterLab and oneAPI Tools

Access using Intel DevCloud

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
00_sklearnex_Prerequisites		00_sklearnex_Prerequisites
01_Intel_Extensions_for_Scikit-learn_Patching_CPU		01_Intel_Extensions_for_Scikit-learn_Patching_CPU
02_Applied_Patching_CPU		02_Applied_Patching_CPU
03_Applied_to_Image_Clustering_CPU		03_Applied_to_Image_Clustering_CPU
04_Applied_to_Galaxy_Classification_CPU		04_Applied_to_Galaxy_Classification_CPU
05_Introduction_dpctl_for_GPU		05_Introduction_dpctl_for_GPU
06_Applied_to_Image_Clustering_GPU		06_Applied_to_Image_Clustering_GPU
07_Applied_to_Galaxy_Classification_GPU		07_Applied_to_Galaxy_Classification_GPU
08_Introduction_to_Numpy_powered_by_oneAPI		08_Introduction_to_Numpy_powered_by_oneAPI
Makefile		Makefile
README.md		README.md
TeacherKit.ipynb		TeacherKit.ipynb
Welcome.ipynb		Welcome.ipynb
requirements.txt		requirements.txt
third-party-programs.txt		third-party-programs.txt

sonersteiner/Machine-Learning-using-oneAPI

Folders and files

Latest commit

History

Repository files navigation

Title

Preparation to run on Intel DevCloud

Purpose

License

Content Details

Pre-requisites

Syllabus

Content Structure

Install Directions

Local Installation of JupyterLab and oneAPI Tools

Access using Intel DevCloud

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages