Machine Learning using oneAPI
After the git clone operation:
mkdir MLoneAPI
cd MLoneAPI
source /glob/development-tools/versions/oneapi/2022.2/oneapi/setvars.sh --force
conda activate base
git clone https://github.com/IntelSoftware/Machine-Learning-using-oneAPI.git
cd Machine-Learning-using-oneAPI
pip install -r requirements.txt
The Jupyter Notebooks in this training are intended to give instructors an accesible but challenging introduction to machine learning using oneAPI. It enumerates and describes many commonly used Scikit-learn* allgorithms which are used daily to address machine learning challenges. The primary purpose is to accelerate commonly used Scikit-learn algorithms for Intel CPUs and GPU's using Intel Extensions for Scikit-learn* which is part of the Intel AI Analytics Toolkit powered by oneAPI.
This workshop is designed to be used on the DevCloud and includes details on submitting batch jobs on the DevCloud environment.
Code samples are licensed under the MIT license. See License.txt for details. Third party program Licenses can be found here: third-party-programs.txt
- Python* Programming
- Calculus
- Linear algebra
- Statistics
- 8 Modules (8 hours)
- 15 Lab Exercises
Folder | Modules | Description | Duration |
---|---|---|---|
01_Intel_Extensions_for_Scikit-learn_Patching_CPU | 01_01_Intel_Extensions_for_Scikit-learn_Patching_CPU | + Describe the basics of oneAPI AI Kit components, and where the Intel(R) Extensions for scikit-learn fits in the broader package. + Describe where to download and how to install the oneAPI AI Kit. + Describe the advantages of one specific component of oneAPI AI Kit, Intel(R) Extensions for scikit-learn, invoked via the sklearnex library. + Apply the patch and unpatch functions with varying granularities including python scripts and also within Jupyter cells: from whole file applications to more surgical patches applied to a single algorithm. + Enumerate sklearn algorithms which have been optimized. |
20 min |
01_Intel_Extensions_for_Scikit-learn_Patching_CPU | 01_02_Coarse_Patching_Instructions | + Describe how to import and apply patch_sklearn(). + Describe how to import and apply unpatch_sklearn(). + Describe method & apply the patch to an entire python program. + Describe how to surgically unpatch specific optimized functions if needed. + Describe a patching strategy that ensures that the Intel Extensions for scikit-learn runs as fast or faster than the stock algorithms it replaces. + Apply patch methodology to speed up KNN on CovType dataset |
20 min |
02_Applied_Patching_CPU | 02_01_Pairwise_DistanceVectorizedStockSImulationReadPortfolio | + Describe and apply the correct surgical patching method to patch pairwise_distance. + Describe which distance metrics such as 'euclidean', 'mahattan', 'cosine', or 'correlation' are optimized by Intel Extensions for Scikit learn. + Describe the application of pairwise_distance to the problem of finding all time series charts similar to a chosen pattern. |
20 min |
02_Applied_Patching_CPU | 02_02_PatchingKNN_CPU | + Describe how to surgically unpatch specific optimized functions if needed. + Apply patching to KNN algorithm. + Describe acceleration for the covtype dataset with KNN classification |
20 min |
02_Applied_Patching_CPU | 02_03_Patching_Kmeans_CPU | + Describe the value of Intel® Extension for Scikit-learn methodology in extending scikit-learn optimization capabilites. + Name key imports and function calls to use Intel Extension for Scikit-learn to target Kmeans. + Build a Sklearn implementation of Kmeans targeting CPU using patching. + Apply patching with dynamic versus lexical scope approaches. |
20 min |
02_Applied_Patching_CPU | 02_04_PatchingSVM_CPU | + Describe how to surgically unpatch specific optimized functions if needed. + Describe differences in patching more globally versus more surgically. + Apply patching to SVC algorithm. + Describe acceleration for the covtype dataset usinf SVC. |
20 min |
03_Applied_to_Image_Clustering_CPU | 03_01_Practicum_ImageClustering | + Explore and interpret the image dataset. + Apply Intel® Extension for Scikit-learn* patches to Principal Components Analysis (PCA), Kmeans,and DBSCAN algorithms. + Synthesize your understanding- searching for ways to patch or unpatch any applicable cells to maximize the performance of each cell. |
60 min |
04_Applied_to_Galaxy_Classification_CPU | 04_01_Practicum_AnalyzeGalaxyBatch | + Apply Multiple Classification Algorithms with GPU to classify stars belonging to each galaxy within a combined super galaxy to determine most accurate model. + Apply Intel® Extension for Scikit-learn* patch and SYCL context to compute on available GPU resource. Synthesize your compreshension by searching for opportunities in each cell to maximize performance. Investigate adding pairwise distance as a means for all the stars within 3 light years. |
60 min |
05_Introduction_dpctl_for_GPU | 05_01_Introduction_simple_gallery_dpctl_for_GPU | + Apply patching while targeting an Intel GPU. + Apply Intel Extension for Scikit-learn to KNeighborsClassifier on Intel GPU. |
30 min |
05_Introduction_dpctl_for_GPU | 05_02_PatchingKNN_GPU | + Describe how to apply dpctl compute follows data in conjuction with patching. + Apply patching to KNN algorithm on covtype dataset. |
20 min |
05_Introduction_dpctl_for_GPU | 05_03_Gallery_of_Functions_on_GPU | + Apply the patch functions with varying granularities. + Leverage the Compute Follows Data methodology using Intel DPCTL library to target Intel GPU. + Apply DPCTL and Patching to variety of Scikit-learn Algorithsm in a simple test harness structure. + For the current hardware configurationson the Intel DevCloud - we are NOT focusing on performance. |
30 min |
06_Applied_to_Image_Clustering_GPU | 06_01_Practicum_ImageClustering | + Explore and interpret the image dataset. + Apply Intel® Extension for Scikit-learn* patches to Principal Components Analysis (PCA), Kmeans,and DBSCAN algorithms. + Synthesize your understanding- searching for ways to patch or unpatch any applicable cells to maximize the performance of each cell. + Apply a q.sh script to submit a job to another node that has a GPU on Intel DevCloud. |
60 min |
07_Applied_to_Galaxy_Classification_GPU | 07_01_Practicum_AnalyzeGalaxyBatch | + Apply Multiple Classification Algorithms with GPU to classify stars belonging to each galaxy within a combined super galaxy to determine most accurate model. + Apply Intel® Extension for Scikit-learn* patch and SYCL context to compute on available GPU resource. + Synthesize your compreshension by searching for opportunities in each cell to maximize performance. |
60 min |
08_Introduction_to_Numpy_powered_by_oneAPI | 08_01_Numpy_How_Fast_Are_Numpy_Ops | + Desribe why replacing inefficient code, such as time consuming loops, wastes resources, and time. + Describe why using Python for highly repetitive small tasks is inefficient. + Describe the additive value of leveraging packages such as Numpy which are powered by oneAPI in a cloud world. + Describe the importance of keeping oneAPI and 3rd party package such as Numpy, Scipy and others is important. + Enumerate ways in which Numpy accelerates code. + Apply loop replacement methodologies in a variety of scenarios. |
60 min |
08_Introduction_to_Numpy_powered_by_oneAPI | 08_02_PandasPoweredBy_oneAPI | + Apply Numpy methods to dramatically speed up certain common Pandas bottlenecks. + Apply WHERE or SELECT in Numpy powered by oneAPI. + Avoid iterrows using Numpy techniques. + Achieve better performacne by converting numerical columns to numpy arrays. |
20 min |
Each module folder has a Jupyter Notebook file (*.ipynb
), this can be opened in Jupyter Lab to view the training contant, edit code and compile/run.
The training content can be accessed locally on the computer after installing necessary tools, or you can directly access using Intel DevCloud without any installation.
The Jupyter Notebooks can be downloaded locally to computer and accessed:
- Install Jupyter Lab on local computer: Installation Guide
- Install Intel oneAPI Base Toolkit on local computer: Installation Guide
- git clone the repo and access the Notebooks using Jupyter Lab
The Jupyter notebooks are tested and can be run on Intel DevCloud without any installation necessary, below are the steps to access these Jupyter notebooks on Intel DevCloud:
- Register on Intel DevCloud
- Login, Get Started and Launch Jupyter Lab
- Open Terminal in Jupyter Lab and git clone the repo and access the Notebooks