CUDA Nsight Systems Profilers and Memory Model (#1)

* firing up the intermediate-level tutorial by refreshing the templates * starts writing the first chapter * starts the memory hierarchy section * adds more text to the registers subsubsection * adds text to the shared memory subsubsection * starts the constant memory subsubsection * adds more text to the shared memory subsubsection * finishes the constant memory subsubsection * finishes the texture memory subsubsection * starts the memory management subsection * adds text to device memory manacement subsection * proceeds with zero-copy memory subsubsection * starts the unified memory subsubsection * finishes the first draft of CUDA memory model chapter * rearranges the lessons and adds some text to the profiling chapter * adds some text to CLI section * adds materials regarding to Nsight Compute * finishes the CLI subsection of the Nsight Systems section * finishes the nsys CLI section * adds minor changes before starting the GUI section * adds the memory hierarchy figure * removes minor typo * adds figures to CUDA Memory Model chapter
MolSSI-Education · Aug 30, 2021 · 5a7036d · 5a7036d
1 parent 11fd308
commit 5a7036d
Show file tree

Hide file tree

Showing 12 changed files with 1,037 additions and 49 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,4 +9,4 @@ _site
 .Rproj.user
 .Rhistory
 .RData
-
+*.lock
diff --git a/_config.yml b/_config.yml
@@ -10,7 +10,7 @@
 carpentry: "swc"
 
 # Overall title for pages.
-title: "Lesson Title"
+title: "CUDA C/C++ Programming and GPU Architecture: A Closer Look"
 
 # Life cycle stage of the lesson
 # possible values: "pre-alpha", "alpha", "beta", "stable"
@@ -26,7 +26,7 @@ kind: "lesson"
 # Magic to make URLs resolve both locally and on GitHub.
 # See https://help.github.com/articles/repository-metadata-on-github-pages/.
 # Please don't change it: <USERNAME>/<PROJECT> is correct.
-repository: MolSSI-Education/undergrad_workshop
+repository: MolSSI-Education/gpu_programming_intermediate
 
 # Email address, no mailto:
 email: "education@molssi.org"

diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md
diff --git a/_episodes/01-profiling.md b/_episodes/01-profiling.md
diff --git a/_episodes/02-cuda-memory-model.md b/_episodes/02-cuda-memory-model.md
diff --git a/fig/Sources/UVA.pdf b/fig/Sources/UVA.pdf
diff --git a/fig/Sources/memory_hierarchy.pdf b/fig/Sources/memory_hierarchy.pdf
diff --git a/fig/UVA.png b/fig/UVA.png
diff --git a/fig/memory_hierarchy.png b/fig/memory_hierarchy.png
diff --git a/index.md b/index.md
@@ -3,14 +3,34 @@ layout: lesson
 root: .  # Is the only page that doesn't follow the pattern /:path/index.html
 permalink: index.html  # Is the only page that doesn't follow the pattern /:path/index.html
 ---
-This is a lesson template for the [Molecular Sciences Software Institute]({{ site.molssi_site }})(MolSSI). It is based on a lesson template from [Software Carpentry](https://www.software-carpentry.org)
 
- To see the full MolSSI's education mission statement, please see
-[here](http://molssi.org/education/education-mission-statement/).
+This tutorial by the [Molecular Sciences Software Institute]({{ site.molssi_site }}) (MolSSI) 
+adopts a profile-driven approach toward CUDA C/C++ programming at the intermediate level and
+blends it with deeper insights from GPU architecture in order to improve the performance of 
+the heterogeneous parallel applications.
+
+The MolSSI's full education mission statement can be found [here](http://molssi.org/education/education-mission-statement/).
 
 > ## Prerequisites
 >
-> Students should be familiar with opening the Terminal window and creating and navigating files in bash.
+> - Previous knowledge of High-performance Computing (HPC) basic concepts are helpful but not required for starting this course.
+Nevertheless, we encourage students to take a glance at our [Parallel Programming](https://education.molssi.org/parallel-programming)
+tutorial, specifically, Chapters 1, 2 and 5 for a brief overview of some of the fundamental concepts in HPC.
+> - Basic familiarity with Bash, C and C++ programming languages is required.
+> - [MolSSI's Fundamentals of Heterogeneous Parallel Programming with CUDA C/C++ at the beginner level](http://education.molssi.org/gpu_programming_beginner)
+is a pre-requisite for the present tutorial.
 {: .prereq}
 
+> ## Software/Hardware Specifications  {#sh-specifications}
+>
+> The following NVIDIA CUDA-enabled GPU devices have been used throughout this tutorial:
+> - Device 0: [GeForce GTX 1650](https://www.nvidia.com/en-us/geforce/graphics-cards/gtx-1650)
+> with Turing architecture (Compute Capability = 7.5)
+> - Device 1: [GeForce GT 740M](https://www.techpowerup.com/gpu-specs/geforce-gt-740m.c2299) 
+> with Kepler architecture (Compute Capability = 3.5)
+>
+> Linux 18.04 (Bionic Beaver) OS is the target platform for CUDA Toolkit v11.2.0 on the two host
+> machines armed with devices 0 and 1.
+{: .callout}
+
 {% include links.md %}
diff --git a/reference.md b/reference.md
@@ -2,8 +2,18 @@
 layout: reference
 ---
 
-## Glossary
+## Further Readings
+
+1. [Cheng, J.; Grossman, M.; McKercher, T. **Professional CUDA C Programming** (Wiley, Indianapolis IN, USA, 2014), ISBN: 978-1-118-73932-7](https://www.wiley.com/en-us/Professional+CUDA+C+Programming-p-9781118739327)
+
+2. [Wilt, N. **The CUDA Handbook: A Comprehensive Guide to GPU Programming** (Addison-Wesley Professional, Crawfordsville IN, USA, 2013), ISBN-13: 978-0321809469](https://www.pearson.com/us/higher-education/program/Wilt-CUDA-Handbook-A-Comprehensive-Guide-to-GPU-Programming-The/PGM260208.html)
 
-FIXME
+3. [Sanders, J.; Kandrot, E. **CUDA by Example: An Introduction to General-Purpose GPU Programming** (Addison-Wesley Professional, Boston MA, USA 2011), ISBN: 9780132180160](https://developer.nvidia.com/cuda-example)
+
+4. [Storti, D.; Yurtoglu, M. **CUDA for Engineers: An Introduction to High-Performance Parallel Computing** (Addison-Wesley Professional, New York NY, USA 2015), ISBN-13: 978-0134177410](https://www.pearson.com/us/higher-education/program/Storti-CUDA-for-Engineers-An-Introduction-to-High-Performance-Parallel-Computing/PGM4858.html)
+
+5. [Han, J.; Sharma, B. **Learn CUDA Programming** (Packt Publishing Ltd., Birmingham, UK 2019) ISBN: 9781788996242](https://www.packtpub.com/product/learn-cuda-programming/9781788996242)
+
+## Glossary
 
 {% include links.md %}
diff --git a/setup.md b/setup.md
@@ -1,41 +1,176 @@
 ---
 title: Setup
 ---
-## Installing Python through Anaconda
-[Python](https://python.org/) is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, however, so we recommend the all-in-one installer Anaconda.
 
-Regardless of how you choose to install it, *please make sure you install Python version 3.x (e.g., 3.4 is fine, 2.7 is not)*.  Also, please set up your python environment at least a day in advance of the workshop. If you encounter problems with the installation procedure, the instructors will be available 30 minutes before the workshop begins to help you.
+> ## Table of Contents
+> - [1. Linux](#1-linux)
+> - [2. Windows](#2-windows)
+> - [3. Mac OS](#3-mac-os)
+{: .prereq}
 
-## Windows - [Video Tutorial](https://www.youtube.com/watch?v=xxQ0mzZ8UvA)
+In this section, we briefly overview the necessary steps for setting up a CUDA development
+environment. At the time of writing this tutorial, **CUDA Toolkit v11.2** is the latest 
+official release. Therefore, this version will be the center of our focus throughout the tutorial.
 
-1. Open the [Anaconda Windows download page](https://www.anaconda.com/download/#windows).
-2. Download the installer.  **Be sure you get the Python 3 version.**
-3. Double-click the installer icon and follow the setup instructions on screen.  You can use MOST of the default options.  The only exception is to check the **Make Anaconda the default Python** option.
+## 1. Linux
 
-## Mac OS X - [Video Tutorial](https://www.youtube.com/watch?v=TcSAln46u9U)
+Depending on the flavor of the Linux OS on the host machine, NVIDIA offers three
+options for installation of CUDA Toolkit: *RPM*, *Debian* or *Runfile* packages.
+Each of these packages are provided as *Local* or *Network* installers.
+Network installers are ideal for users with high-speed internet connection and
+low local disk storage capacity. Network installers also allow users to
+download only those applications from CUDA Toolkit that they need. Local installers,
+on the other hand, offer a stand-alone large-size installer file that should be downloaded
+to the host machine once. Future installations using this installer file will not require
+any internet connection. Runfiles are Local installers but depending on the type of
+Linux OS on the host machine, RPM and Debian packages can be Local or Network installers.
 
-1. Open the [Anaconda MacOS download page](https://www.anaconda.com/download/#macos).
-2. Download the installer. **Be sure you get the Python 3 version.**
-3. Double click the installer icon and follow the setup instructions.  You can use all of the default options.
+Managing the dependencies and prerequisites in various operating systems can be
+very different depending on the chosen installation method. In comparison with 
+Debian and RPM packages, Runfiles offer a cleaner and more independent method 
+with more control over the installation process. Meanwhile, the installed CUDA 
+Toolkit and its dependent software will not automatically update. On the other hand,
+Debian and RPM packages provide a native and straightforward way to install the CUDA
+Toolkit. However, resolving dependencies, conflicts and broken packages will often
+be an inseparable part of the process. Take a look at CUDA Toolkit 
+[documentation](https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#linux)
+for further information and details on installers.
 
-## Obtain lesson materials
-1. Put link to download materials here.
-2. Create a folder called `cms-workshop` on your Desktop.
-3. Move the downloaded materials to the new folder.
-4. Unzip the file.  
+Before spending time on the installation of the CUDA Toolkit on your Linux machine,
+consider the following set of actions.
 
-## Open the Terminal Window
-- Windows:  Click Windows Key + R, type cmd, press Enter.
-- MacOS: The Terminal application can be found in Applications -> Utilities -> Terminal.
+> ## Pre-installation Steps   {#pre-installation-steps}
+> - Make sure your system has a CUDA-capable graphics processing unit (GPU) device.
+> There are multiple ways to do this task:
+>
+>    &#9824; For a minimalist, a simple bash command will do the trick
+>
+>    ~~~
+>    $ ls -l /dev/nv*   
+>    ~~~
+>    {: .language-bash} 
+> 
+>    if the system is armed with a GPU accelerator device, a typical output of the 
+>    command above would be:
+>
+>    ~~~
+>    crw-rw-rw- 1 root root 195,   0 Jan 10 09:43 /dev/nvidia0        <--   This line corresponds to your active GPU accelerator device
+>    crw-rw-rw- 1 root root 195, 255 Jan 10 09:43 /dev/nvidiactl
+>    crw-rw-rw- 1 root root 195, 254 Jan 10 09:43 /dev/nvidia-modeset
+>    crw-rw-rw- 1 root root 236,   0 Jan 10 09:43 /dev/nvidia-uvm
+>    crw-rw-rw- 1 root root 236,   1 Jan 10 09:43 /dev/nvidia-uvm-tools
+>    crw------- 1 root root 243,   0 Jan 10 09:43 /dev/nvme0
+>    brw-rw---- 1 root disk 259,   0 Jan 10 09:43 /dev/nvme0n1
+>    brw-rw---- 1 root disk 259,   1 Jan 10 09:43 /dev/nvme0n1p1
+>    brw-rw---- 1 root disk 259,   2 Jan 10 09:43 /dev/nvme0n1p2
+>    brw-rw---- 1 root disk 259,   3 Jan 10 09:43 /dev/nvme0n1p3
+>    brw-rw---- 1 root disk 259,   4 Jan 10 09:43 /dev/nvme0n1p4
+>    ~~~
+>    {: .output}
+>   
+>    &#9829; Helpful information about active hardware on the host machine (including graphics card) can be obtained from
+>    **About This Computer** panel which can be accessed from the top-right gear icon at the top 
+>    corner of the Ubuntu (Unity) desktop screen or through **Settings/Details** icon that can be looked up from the search bar.
+>
+>    &#9827; NVIDIA website provides [tables](https://developer.nvidia.com/cuda-gpus) of CUDA-enabled GPUs along side
+>    their [***compute capabilities***](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability).
+>    Compute capability (or *streaming multiprocessor version*) consisting of a version number (M.N) where M and N stand for 
+>    major and minor digits, respectively, specifies features that the GPU hardware can support. GPU devices with the same major 
+>    number (M) belong to the same core architecture: 8 for devices based on the *Ampere* architecture, 7 for devices based 
+>    on the *Volta* architecture, 6 for devices based on the *Pascal* architecture, 5 for devices based on the *Maxwell* 
+>    architecture, 3 for devices based on the *Kepler* architecture, 2 for devices based on the *Fermi* architecture, and 1 
+>    for devices based on the *Tesla* architecture. Older CUDA-enabled GPUs (legacy GPUs) are listed 
+>    [here](https://developer.nvidia.com/cuda-legacy-gpus).
+>      
+>    &#9830; The *NVIDIA System Management Interface* (`nvidia-smi`) is a command-line tool which is derived
+>    from *NVIDIA Management Library (NVML)* and designed to provide control and monitoring
+>    capabilities over NVIDIA CUDA-enabled GPU devices. `nvidia-smi` ships with NVIDIA GPU display drivers
+>    on Linux and some versions of Microsoft Windows. For further details, see
+>    [here](https://developer.nvidia.com/nvidia-system-management-interface). In order to run `nvidia-smi`,
+>    simply call it through a terminal:
+>
+>    ~~~
+>    $ nvidia-smi   
+>    ~~~
+>    {: .language-bash} 
+>    
+>    A typical output would look like:
+> 
+>    ~~~
+>    +-----------------------------------------------------------------------------+
+>    | NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.2     |
+>    |-------------------------------+----------------------+----------------------+
+>    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
+>    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
+>    |                               |                      |               MIG M. |
+>    |===============================+======================+======================|
+>    |   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
+>    | N/A   42C    P8     2W /  N/A |    438MiB /  3911MiB |     10%      Default |
+>    |                               |                      |                  N/A |
+>    +-------------------------------+----------------------+----------------------+
+>    ...                                                                               
+>    ~~~
+>    {: .output}
+>
+>    where unnecessary information from the output are replaced with ellipses. 
+>    The result shows the driver version (455.38), CUDA version (11.2), and the
+>    CUDA-enabled GPU device name (GeForce GTX 1650). Since multiple GPUs might be
+>    available on each machine, applications such as `nvidia-smi` often adopt
+>    integer indices, starting from zero, for referencing the GPU devices. 
+>
+> - Because the present tutorial is based on the CUDA C/C++ programming language extensions,
+> check to see if the version of Linux on the host machine is supported by CUDA.
+> To do so, take a
+> glance at [this](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements) page.
+> In order to verify the availability of GNU compilers (gcc and g++) on the system, try
+>
+>    ~~~
+>    $ <gnu-compiler> --version
+>    ~~~
+>    {: .language-bash} 
+>    
+>    where `<gnu-compiler>` placeholder should be replaced with either `gcc` or `g++`.
+> 
+> - Download the NVIDIA CUDA Toolkit from [here](https://developer.nvidia.com/cuda-downloads).
+>   Once the CUDA Toolkit installer is downloaded, follow the instructions
+>   [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile) based on the type of 
+>   your Linux OS platform.
+{: .prereq}
 
-## Start a Jupyter notebook
-In the Terminal window, type
-```
-$ jupyter notebook
-```
-It may take a few seconds to load the page, especially if it is the first time you have ever used the jupyter notebook, so don't panic if nothing loads for a few seconds.  Then a new window should open in your default internet browser. Use the file navigation window to navigate to the `cms-workshop` folder.  In the upper right hand corner, click New, then choose Python 3 from the dropdown list.  You're ready to go!
+> ## **Known Issues**:
+>
+> It is a very common issue that a previously installed version of CUDA conflicts with a
+> newer version that is intended to be installed. In order to resolve the conflict, check the compatibility
+> [matrices](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#handle-uninstallation)
+> and follow the instructions provided thereafter.
+{: .callout}
 
+## 2. Windows
 
+Basic instructions on using Local or Network installers can be found on CUDA Toolkit's
+[documentation](https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html).
+NVIDIA CUDA Toolkit supports specific version combinations of Microsoft Windows OSs,
+compilers and Microsoft Visual Studio environments. For further details, see
+[here](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/#system-requirements).
 
+> ## WSL Users
+>
+> After following directions in the [Pre-installation Steps](#pre-installation-steps) section, 
+> the *Windows Subsystem for Linux (WSL)* users can refer to CUDA Toolkit
+> [documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#wsl-installation)
+> for setting up CUDA Toolkit and following the instructions.
+{: .prereq}
+
+## 3. Mac OS
+
+> ## **Note**:
+>
+> CUDA Toolkit v10.2.x is the last release that supports Mac OS as a target platform
+> for heterogeneous parallel code development with CUDA. However, NVIDIA still provides
+> support for launching CUDA debugger and profiler application sessions for Mac OS as a host platform.
+>
+> Since the present tutorial is based on the latest
+> version (v11.2.0), the Mac OS will not be the subject of our further consideration.
+{: .discussion}
 
 {% include links.md %}
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,4 +9,4 @@ _site @@
     .Rproj.user
     .Rhistory
     .RData
+    *.lock