Skip to content

Commit

Permalink
CUDA Nsight Systems Profilers and Memory Model (#1)
Browse files Browse the repository at this point in the history
* firing up the intermediate-level tutorial by refreshing the templates

* starts writing the first chapter

* starts the memory hierarchy section

* adds more text to the registers subsubsection

* adds text to the shared memory subsubsection

* starts the constant memory subsubsection

* adds more text to the shared memory subsubsection

* finishes the constant memory subsubsection

* finishes the texture memory subsubsection

* starts the memory management subsection

* adds text to device memory manacement subsection

* proceeds with zero-copy memory subsubsection

* starts the unified memory subsubsection

* finishes the first draft of CUDA memory model chapter

* rearranges the lessons and adds some text to the profiling chapter

* adds some text to CLI section

* adds materials regarding to Nsight Compute

* finishes the CLI subsection of the Nsight Systems section

* finishes the nsys CLI section

* adds minor changes before starting the GUI section

* adds the memory hierarchy figure

* removes minor typo

* adds figures to CUDA Memory Model chapter
  • Loading branch information
SinaMostafanejad authored Aug 30, 2021
1 parent 11fd308 commit 5a7036d
Show file tree
Hide file tree
Showing 12 changed files with 1,037 additions and 49 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ _site
.Rproj.user
.Rhistory
.RData

*.lock
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
carpentry: "swc"

# Overall title for pages.
title: "Lesson Title"
title: "CUDA C/C++ Programming and GPU Architecture: A Closer Look"

# Life cycle stage of the lesson
# possible values: "pre-alpha", "alpha", "beta", "stable"
Expand All @@ -26,7 +26,7 @@ kind: "lesson"
# Magic to make URLs resolve both locally and on GitHub.
# See https://help.github.com/articles/repository-metadata-on-github-pages/.
# Please don't change it: <USERNAME>/<PROJECT> is correct.
repository: MolSSI-Education/undergrad_workshop
repository: MolSSI-Education/gpu_programming_intermediate

# Email address, no mailto:
email: "education@molssi.org"
Expand Down
15 changes: 0 additions & 15 deletions _episodes/01-introduction.md

This file was deleted.

333 changes: 333 additions & 0 deletions _episodes/01-profiling.md

Large diffs are not rendered by default.

505 changes: 505 additions & 0 deletions _episodes/02-cuda-memory-model.md

Large diffs are not rendered by default.

Binary file added fig/Sources/UVA.pdf
Binary file not shown.
Binary file added fig/Sources/memory_hierarchy.pdf
Binary file not shown.
Binary file added fig/UVA.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added fig/memory_hierarchy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 24 additions & 4 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,34 @@ layout: lesson
root: . # Is the only page that doesn't follow the pattern /:path/index.html
permalink: index.html # Is the only page that doesn't follow the pattern /:path/index.html
---
This is a lesson template for the [Molecular Sciences Software Institute]({{ site.molssi_site }})(MolSSI). It is based on a lesson template from [Software Carpentry](https://www.software-carpentry.org)

To see the full MolSSI's education mission statement, please see
[here](http://molssi.org/education/education-mission-statement/).
This tutorial by the [Molecular Sciences Software Institute]({{ site.molssi_site }}) (MolSSI)
adopts a profile-driven approach toward CUDA C/C++ programming at the intermediate level and
blends it with deeper insights from GPU architecture in order to improve the performance of
the heterogeneous parallel applications.

The MolSSI's full education mission statement can be found [here](http://molssi.org/education/education-mission-statement/).

> ## Prerequisites
>
> Students should be familiar with opening the Terminal window and creating and navigating files in bash.
> - Previous knowledge of High-performance Computing (HPC) basic concepts are helpful but not required for starting this course.
Nevertheless, we encourage students to take a glance at our [Parallel Programming](https://education.molssi.org/parallel-programming)
tutorial, specifically, Chapters 1, 2 and 5 for a brief overview of some of the fundamental concepts in HPC.
> - Basic familiarity with Bash, C and C++ programming languages is required.
> - [MolSSI's Fundamentals of Heterogeneous Parallel Programming with CUDA C/C++ at the beginner level](http://education.molssi.org/gpu_programming_beginner)
is a pre-requisite for the present tutorial.
{: .prereq}

> ## Software/Hardware Specifications {#sh-specifications}
>
> The following NVIDIA CUDA-enabled GPU devices have been used throughout this tutorial:
> - Device 0: [GeForce GTX 1650](https://www.nvidia.com/en-us/geforce/graphics-cards/gtx-1650)
> with Turing architecture (Compute Capability = 7.5)
> - Device 1: [GeForce GT 740M](https://www.techpowerup.com/gpu-specs/geforce-gt-740m.c2299)
> with Kepler architecture (Compute Capability = 3.5)
>
> Linux 18.04 (Bionic Beaver) OS is the target platform for CUDA Toolkit v11.2.0 on the two host
> machines armed with devices 0 and 1.
{: .callout}

{% include links.md %}
14 changes: 12 additions & 2 deletions reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,18 @@
layout: reference
---

## Glossary
## Further Readings

1. [Cheng, J.; Grossman, M.; McKercher, T. **Professional CUDA C Programming** (Wiley, Indianapolis IN, USA, 2014), ISBN: 978-1-118-73932-7](https://www.wiley.com/en-us/Professional+CUDA+C+Programming-p-9781118739327)

2. [Wilt, N. **The CUDA Handbook: A Comprehensive Guide to GPU Programming** (Addison-Wesley Professional, Crawfordsville IN, USA, 2013), ISBN-13: 978-0321809469](https://www.pearson.com/us/higher-education/program/Wilt-CUDA-Handbook-A-Comprehensive-Guide-to-GPU-Programming-The/PGM260208.html)

FIXME
3. [Sanders, J.; Kandrot, E. **CUDA by Example: An Introduction to General-Purpose GPU Programming** (Addison-Wesley Professional, Boston MA, USA 2011), ISBN: 9780132180160](https://developer.nvidia.com/cuda-example)

4. [Storti, D.; Yurtoglu, M. **CUDA for Engineers: An Introduction to High-Performance Parallel Computing** (Addison-Wesley Professional, New York NY, USA 2015), ISBN-13: 978-0134177410](https://www.pearson.com/us/higher-education/program/Storti-CUDA-for-Engineers-An-Introduction-to-High-Performance-Parallel-Computing/PGM4858.html)

5. [Han, J.; Sharma, B. **Learn CUDA Programming** (Packt Publishing Ltd., Birmingham, UK 2019) ISBN: 9781788996242](https://www.packtpub.com/product/learn-cuda-programming/9781788996242)

## Glossary

{% include links.md %}
185 changes: 160 additions & 25 deletions setup.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,176 @@
---
title: Setup
---
## Installing Python through Anaconda
[Python](https://python.org/) is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, however, so we recommend the all-in-one installer Anaconda.

Regardless of how you choose to install it, *please make sure you install Python version 3.x (e.g., 3.4 is fine, 2.7 is not)*. Also, please set up your python environment at least a day in advance of the workshop. If you encounter problems with the installation procedure, the instructors will be available 30 minutes before the workshop begins to help you.
> ## Table of Contents
> - [1. Linux](#1-linux)
> - [2. Windows](#2-windows)
> - [3. Mac OS](#3-mac-os)
{: .prereq}

## Windows - [Video Tutorial](https://www.youtube.com/watch?v=xxQ0mzZ8UvA)
In this section, we briefly overview the necessary steps for setting up a CUDA development
environment. At the time of writing this tutorial, **CUDA Toolkit v11.2** is the latest
official release. Therefore, this version will be the center of our focus throughout the tutorial.

1. Open the [Anaconda Windows download page](https://www.anaconda.com/download/#windows).
2. Download the installer. **Be sure you get the Python 3 version.**
3. Double-click the installer icon and follow the setup instructions on screen. You can use MOST of the default options. The only exception is to check the **Make Anaconda the default Python** option.
## 1. Linux

## Mac OS X - [Video Tutorial](https://www.youtube.com/watch?v=TcSAln46u9U)
Depending on the flavor of the Linux OS on the host machine, NVIDIA offers three
options for installation of CUDA Toolkit: *RPM*, *Debian* or *Runfile* packages.
Each of these packages are provided as *Local* or *Network* installers.
Network installers are ideal for users with high-speed internet connection and
low local disk storage capacity. Network installers also allow users to
download only those applications from CUDA Toolkit that they need. Local installers,
on the other hand, offer a stand-alone large-size installer file that should be downloaded
to the host machine once. Future installations using this installer file will not require
any internet connection. Runfiles are Local installers but depending on the type of
Linux OS on the host machine, RPM and Debian packages can be Local or Network installers.

1. Open the [Anaconda MacOS download page](https://www.anaconda.com/download/#macos).
2. Download the installer. **Be sure you get the Python 3 version.**
3. Double click the installer icon and follow the setup instructions. You can use all of the default options.
Managing the dependencies and prerequisites in various operating systems can be
very different depending on the chosen installation method. In comparison with
Debian and RPM packages, Runfiles offer a cleaner and more independent method
with more control over the installation process. Meanwhile, the installed CUDA
Toolkit and its dependent software will not automatically update. On the other hand,
Debian and RPM packages provide a native and straightforward way to install the CUDA
Toolkit. However, resolving dependencies, conflicts and broken packages will often
be an inseparable part of the process. Take a look at CUDA Toolkit
[documentation](https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html#linux)
for further information and details on installers.

## Obtain lesson materials
1. Put link to download materials here.
2. Create a folder called `cms-workshop` on your Desktop.
3. Move the downloaded materials to the new folder.
4. Unzip the file.
Before spending time on the installation of the CUDA Toolkit on your Linux machine,
consider the following set of actions.

## Open the Terminal Window
- Windows: Click Windows Key + R, type cmd, press Enter.
- MacOS: The Terminal application can be found in Applications -> Utilities -> Terminal.
> ## Pre-installation Steps {#pre-installation-steps}
> - Make sure your system has a CUDA-capable graphics processing unit (GPU) device.
> There are multiple ways to do this task:
>
> &#9824; For a minimalist, a simple bash command will do the trick
>
> ~~~
> $ ls -l /dev/nv*
> ~~~
> {: .language-bash}
>
> if the system is armed with a GPU accelerator device, a typical output of the
> command above would be:
>
> ~~~
> crw-rw-rw- 1 root root 195, 0 Jan 10 09:43 /dev/nvidia0 <-- This line corresponds to your active GPU accelerator device
> crw-rw-rw- 1 root root 195, 255 Jan 10 09:43 /dev/nvidiactl
> crw-rw-rw- 1 root root 195, 254 Jan 10 09:43 /dev/nvidia-modeset
> crw-rw-rw- 1 root root 236, 0 Jan 10 09:43 /dev/nvidia-uvm
> crw-rw-rw- 1 root root 236, 1 Jan 10 09:43 /dev/nvidia-uvm-tools
> crw------- 1 root root 243, 0 Jan 10 09:43 /dev/nvme0
> brw-rw---- 1 root disk 259, 0 Jan 10 09:43 /dev/nvme0n1
> brw-rw---- 1 root disk 259, 1 Jan 10 09:43 /dev/nvme0n1p1
> brw-rw---- 1 root disk 259, 2 Jan 10 09:43 /dev/nvme0n1p2
> brw-rw---- 1 root disk 259, 3 Jan 10 09:43 /dev/nvme0n1p3
> brw-rw---- 1 root disk 259, 4 Jan 10 09:43 /dev/nvme0n1p4
> ~~~
> {: .output}
>
> &#9829; Helpful information about active hardware on the host machine (including graphics card) can be obtained from
> **About This Computer** panel which can be accessed from the top-right gear icon at the top
> corner of the Ubuntu (Unity) desktop screen or through **Settings/Details** icon that can be looked up from the search bar.
>
> &#9827; NVIDIA website provides [tables](https://developer.nvidia.com/cuda-gpus) of CUDA-enabled GPUs along side
> their [***compute capabilities***](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability).
> Compute capability (or *streaming multiprocessor version*) consisting of a version number (M.N) where M and N stand for
> major and minor digits, respectively, specifies features that the GPU hardware can support. GPU devices with the same major
> number (M) belong to the same core architecture: 8 for devices based on the *Ampere* architecture, 7 for devices based
> on the *Volta* architecture, 6 for devices based on the *Pascal* architecture, 5 for devices based on the *Maxwell*
> architecture, 3 for devices based on the *Kepler* architecture, 2 for devices based on the *Fermi* architecture, and 1
> for devices based on the *Tesla* architecture. Older CUDA-enabled GPUs (legacy GPUs) are listed
> [here](https://developer.nvidia.com/cuda-legacy-gpus).
>
> &#9830; The *NVIDIA System Management Interface* (`nvidia-smi`) is a command-line tool which is derived
> from *NVIDIA Management Library (NVML)* and designed to provide control and monitoring
> capabilities over NVIDIA CUDA-enabled GPU devices. `nvidia-smi` ships with NVIDIA GPU display drivers
> on Linux and some versions of Microsoft Windows. For further details, see
> [here](https://developer.nvidia.com/nvidia-system-management-interface). In order to run `nvidia-smi`,
> simply call it through a terminal:
>
> ~~~
> $ nvidia-smi
> ~~~
> {: .language-bash}
>
> A typical output would look like:
>
> ~~~
> +-----------------------------------------------------------------------------+
> | NVIDIA-SMI 455.38 Driver Version: 455.38 CUDA Version: 11.2 |
> |-------------------------------+----------------------+----------------------+
> | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
> | | | MIG M. |
> |===============================+======================+======================|
> | 0 GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
> | N/A 42C P8 2W / N/A | 438MiB / 3911MiB | 10% Default |
> | | | N/A |
> +-------------------------------+----------------------+----------------------+
> ...
> ~~~
> {: .output}
>
> where unnecessary information from the output are replaced with ellipses.
> The result shows the driver version (455.38), CUDA version (11.2), and the
> CUDA-enabled GPU device name (GeForce GTX 1650). Since multiple GPUs might be
> available on each machine, applications such as `nvidia-smi` often adopt
> integer indices, starting from zero, for referencing the GPU devices.
>
> - Because the present tutorial is based on the CUDA C/C++ programming language extensions,
> check to see if the version of Linux on the host machine is supported by CUDA.
> To do so, take a
> glance at [this](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements) page.
> In order to verify the availability of GNU compilers (gcc and g++) on the system, try
>
> ~~~
> $ <gnu-compiler> --version
> ~~~
> {: .language-bash}
>
> where `<gnu-compiler>` placeholder should be replaced with either `gcc` or `g++`.
>
> - Download the NVIDIA CUDA Toolkit from [here](https://developer.nvidia.com/cuda-downloads).
> Once the CUDA Toolkit installer is downloaded, follow the instructions
> [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile) based on the type of
> your Linux OS platform.
{: .prereq}
## Start a Jupyter notebook
In the Terminal window, type
```
$ jupyter notebook
```
It may take a few seconds to load the page, especially if it is the first time you have ever used the jupyter notebook, so don't panic if nothing loads for a few seconds. Then a new window should open in your default internet browser. Use the file navigation window to navigate to the `cms-workshop` folder. In the upper right hand corner, click New, then choose Python 3 from the dropdown list. You're ready to go!
> ## **Known Issues**:
>
> It is a very common issue that a previously installed version of CUDA conflicts with a
> newer version that is intended to be installed. In order to resolve the conflict, check the compatibility
> [matrices](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#handle-uninstallation)
> and follow the instructions provided thereafter.
{: .callout}
## 2. Windows
Basic instructions on using Local or Network installers can be found on CUDA Toolkit's
[documentation](https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html).
NVIDIA CUDA Toolkit supports specific version combinations of Microsoft Windows OSs,
compilers and Microsoft Visual Studio environments. For further details, see
[here](https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/#system-requirements).
> ## WSL Users
>
> After following directions in the [Pre-installation Steps](#pre-installation-steps) section,
> the *Windows Subsystem for Linux (WSL)* users can refer to CUDA Toolkit
> [documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#wsl-installation)
> for setting up CUDA Toolkit and following the instructions.
{: .prereq}
## 3. Mac OS
> ## **Note**:
>
> CUDA Toolkit v10.2.x is the last release that supports Mac OS as a target platform
> for heterogeneous parallel code development with CUDA. However, NVIDIA still provides
> support for launching CUDA debugger and profiler application sessions for Mac OS as a host platform.
>
> Since the present tutorial is based on the latest
> version (v11.2.0), the Mac OS will not be the subject of our further consideration.
{: .discussion}
{% include links.md %}

0 comments on commit 5a7036d

Please sign in to comment.