Skip to content

Commit

Permalink
Squashed 'hpc' changes from 9d08e6d~10..9d08e6d (#419)
Browse files Browse the repository at this point in the history
9d08e6d Update index.html
0114189 Merge pull request #93 from RepoOps/gh-pages-20210927-032739
cd52282 [xf_hpc] update release version
cdffb7b update index
83b408e Merge pull request #92 from RepoOps/gh-pages-20210927-031807
c126cc9 Update release.rst.txt
b5ede6b [xf_hpc] build documents
96dd08b Merge pull request #82 from RepoOps/gh-pages-20210615-023421
b886cd4 update documents
a1cf259 Merge pull request #80 from RepoOps/gh-pages-20210614-075104
fb13c8b update release notes
6095532 Merge pull request #79 from RepoOps/gh-pages-20210610-095713
8f6b9e2 fix version errors
73ab5f2 Merge pull request #78 from RepoOps/gh-pages-20210610-070616
b5d0b01 update release notes
46a85d2 Merge pull request #76 from RepoOps/gh-pages-20210608-045126
32fd3e1 build documents
ceb4613 update docs

Co-authored-by: sdausr <sdausr@xilinx.com>
  • Loading branch information
2 people authored and GitHub Enterprise committed Sep 28, 2021
1 parent 844d608 commit 92e9eb9
Show file tree
Hide file tree
Showing 177 changed files with 104,611 additions and 2 deletions.
Binary file added hpc/2021.2/_images/fcn_cus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added hpc/2021.2/_images/mlp_fcn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added hpc/2021.2/_images/rtm2DBwdKrn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added hpc/2021.2/_images/rtm2DBwdStr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added hpc/2021.2/_images/rtm2DFwdKrn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added hpc/2021.2/_images/rtm2DShotPar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added hpc/2021.2/_images/rtm2DStreaming.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
162 changes: 162 additions & 0 deletions hpc/2021.2/_sources/benchmark.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
..
Copyright 2019 Xilinx, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

************
Benchmark
************

.. _performance:

Performance
################

Conjugate Gradient Algorithm
****************************

Here are benchmarks of the Vitis HPC Library using the Vitis environment and comparing results on several FPGA and CPU platforms.
It supports software and hardware emulation as well as running hardware accelerators on the Alveo U250, U280 or U50.

GEMV-based CG
^^^^^^^^^^^^^^^^^^^^
The following table lists the resource utilization for GEMV-based CG kernel with 16 HBM channels storing the matrix.

.. table:: Resource Utilization on U50
:align: center

+----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+
| Name | LUT | LUTAsMem | REG | BRAM | URAM | DSP |
+============================+==================+==================+===================+================+===============+================+
| User Budget | 699619 [100.00%] | 369603 [100.00%] | 1447189 [100.00%] | 1112 [100.00%] | 640 [100.00%] | 5936 [100.00%] |
+----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+
| Used Resources | 186448 [ 26.65%] | 17334 [ 4.69%] | 325149 [ 22.47%] | 128 [ 11.51%] | 0 [ 0.00%] | 1262 [ 21.26%] |
+----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+


.. table:: Benchmark Results on U50
:align: center

+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+
| Vector Size | Time per Iteration [ms] | U50 Performance [GFLOPS] | U50 Energy Efficiency [GFLOPS/W] | CPU Performance [GFLOPS] | Acceleration Ratio |
+=============+=========================+===========================+==================================+==========================+====================+
| 1024 | 0.073 | 26.938 | 0.723 | 12.996 | 2.073 |
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+
| 2048 | 0.2557 | 30.658 | 0.766 | 27.469 | 1.116 |
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+
| 4096 | 0.9202 | 34.018 | 0.812 | 7.776 | 4.375 |
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+
| 8192 | 3.405 | 36.742 | 0.839 | 8.226 | 4.467 |
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+


SPMV-based CG
^^^^^^^^^^^^^^^^^^^^^^^
The following table lists the resource utilization for SPMV-based CG kernel.

.. table:: Resource Utilization on U280
:align: center

+----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+
| Name | LUT | LUTAsMem | REG | BRAM | URAM | DSP |
+============================+===================+==================+===================+================+===============+================+
| User Budget | 1104369 [100.00%] | 552814 [100.00%] | 2217989 [100.00%] | 1693 [100.00%] | 896 [100.00%] | 9020 [100.00%] |
+----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+
| Used Resources | 285372 [ 25.84%] | 36605 [ 6.62%] | 442368 [ 19.94%] | 267 [ 15.77%] | 64 [ 7.14%] | 1192 [ 13.22%] |
+----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+

.. table:: Benchmark Results on U280
:align: center


+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| Matrix Name | Rows/Cols | NNZs | Padded Rows/Cols | Padded NNZs | Padding Ratio | No. iterations | Time per Iter [ms] | Time per Iter on CPU [ms] | Acceleration Ratio |
+================+===========+=========+==================+=============+===============+================+====================+===========================+====================+
| nasa2910 | 2910 | 174296 | 2912 | 297952 | 1.70946 | 1777 | 0.0511172 | 0.0692836 | 1.36 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| ex9 | 3363 | 99471 | 3364 | 199328 | 2.00388 | 5000 | 0.0497677 | 0.0559332 | 1.12 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| bcsstk24 | 3562 | 159910 | 3564 | 222656 | 1.39238 | 5000 | 0.0598962 | 0.0581827 | 0.97 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| bcsstk15 | 3948 | 117816 | 3948 | 267488 | 2.27039 | 658 | 0.0927269 | 0.125615 | 1.35 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| bcsstk28 | 4410 | 219024 | 4412 | 319264 | 1.45767 | 4878 | 0.0586356 | 6.92198 | 118.05 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| s3rmt3m3 | 5357 | 207695 | 5360 | 330624 | 1.59187 | 5000 | 0.0744822 | 6.55229 | 87.97 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| s2rmq4m1 | 5489 | 281111 | 5492 | 427648 | 1.52128 | 1779 | 0.084562 | 6.75384 | 79.87 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| nd3k | 9000 | 3279690 | 9000 | 4277792 | 1.30433 | 5000 | 0.363479 | 4.66861 | 12.84 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| ted_B | 10605 | 144579 | 10608 | 548416 | 3.79319 | 30 | 0.984467 | 6.53108 | 6.63 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| ted_B_unscaled | 10605 | 144579 | 10608 | 548416 | 3.79319 | 16 | 1.75354 | 8.59891 | 4.90 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| msc10848 | 10848 | 1229778 | 10848 | 2050720 | 1.66755 | 5000 | 0.230942 | 5.43921 | 23.55 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| cbuckle | 13681 | 676515 | 13684 | 924832 | 1.36705 | 1282 | 0.16427 | 5.48588 | 33.40 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| olafu | 16146 | 1015156 | 16148 | 1452320 | 1.43064 | 5000 | 0.169174 | 5.05108 | 29.86 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| gyro_k | 17361 | 1021159 | 17364 | 1932384 | 1.89234 | 5000 | 0.254172 | 4.85938 | 19.12 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| bodyy4 | 17546 | 121938 | 17548 | 710112 | 5.82355 | 230 | 0.174435 | 4.73164 | 27.13 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| nd6k | 18000 | 6897316 | 18000 | 9415552 | 1.3651 | 5000 | 0.809868 | 4.25772 | 5.26 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| raefsky4 | 19779 | 1328611 | 19780 | 2268704 | 1.70758 | 5000 | 0.268956 | 4.22843 | 15.72 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+
| bcsstk36 | 23052 | 1143140 | 23052 | 1833056 | 1.60353 | 5000 | 0.253049 | 3.9882 | 15.76 |
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+


These are details for benchmark result and usage steps.

.. toctree::
:maxdepth: 1

user_guide/L2/benchmark/cg_gemv_jacobi.rst
user_guide/L2/benchmark/cg_spmv_jacobi.rst

Benchmark Overview
###################


.. _l2_vitis_hpc:

Vitis HPC Library
*****************

* **Download code**

These hpc benchmarks can be downloaded from `vitis libraries <https://github.com/Xilinx/Vitis_Libraries.git>`_ ``master`` branch.

.. code-block:: bash
git clone https://github.com/Xilinx/Vitis_Libraries.git
cd Vitis_Libraries
git checkout master
cd hpc
* **Setup environment**

Specifying the corresponding Vitis, XRT, and path to the platform repository by running following commands.
Set up Python environment with :doc:`Python environment setup guide <../pyenvguide>`

.. code-block:: bash
source <intstall_path>/installs/lin64/Vitis/2021.2/settings64.sh
source /opt/xilinx/xrt/setup.sh
export PLATFORM_REPO_PATHS=/opt/xilinx/platforms
88 changes: 88 additions & 0 deletions hpc/2021.2/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
..
Copyright 2019 - 2021 Xilinx, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
=====================
Vitis HPC Library
=====================

Vitis HPC Library provides an acceleration libray for applications with high
computation workload, e.g. seismic imaging and inversion, high-precision simulations,
genomics and etc. Three types of components are provided in this library,
namely L1 primitives, L2 kernels and L3 software APIs. These implementations are organized in their
corresponding directories L1, L2 and L3. The L1 primitives' implementations can be leveraged
by FPGA hardware developers. The L2 kernels' implementations provide examples for
Vitis kernel developers. The L3 APIs provide C/C++ functions for software developers to offload HPC workloads.
This library depends on the **Xilinx BLAS and SPARSE** library to implement some components.

Because HPC applications normally have high precision requirements, the current supported data
type are mainly single precision floating point type (FP32 type) and double precision floating point type (FP64 type).
Although most components can be configured
to support other data types, some of the architectures are specifically optimized to address
FP32 operations, e.g. accumulations.

In the current release, three types of applications have been addressed by this library, namely
RTM (Reverse Time Migration), CG (Conjugate Gradient) method and MLP-based high precesion seismic inversion.
RTM is an important seismic imaging technique used for producing an accurate representation of the subsurface.
The basic computation unit of an RTM application is a stencil module, which is the essential
step for explicit **FDTD (Finite Difference Time Domain)** solutions. Seismic inversion is a procedure
used to reconstruct subsurface properties via the seismic reflection data.

Many engineering problems, such as FEM, are eventually transformed to a group of linear systems.
Conjugate Gradient method, an iterative method, is widely adopted to solve linear systems,
especially those with highly sparse and large-dimention matrices.
Preconditioner matrix is necessary for most of the problems in order to achieve convergent results and reduce dramatically the
number of iterations, hence improves the entire performance.

Modern technology uses high precision MLP (Multilayer perceptron) based neural network to speed up this process.
The basic unit of a MLP application normally includes a fully connected neural (**FCN**) network and an activation
function, e.g. sigmoid function.


In this library, you will find the implementations of stencil module, 2D and 3D RTM forward propogation path,
2D RTM application, CG solvers with Jacobi preconditioner, high-precision fully connected neural network and sigmoid activation function.


Since all the kernel code is developed with the permissive Apache 2.0 license,
advanced users can easily tailor, optimize or combine them for their own need.
Demos and usage examples of different implementation level are also provided
for reference.

.. toctree::
:caption: Library Overview
:maxdepth: 1

overview.rst
release.rst

.. toctree::
:caption: User Guide
:maxdepth: 2

pyenvguide.rst
user_guide/L1/L1.rst
user_guide/L2/L2.rst
user_guide/L3/L3.rst

.. toctree::
:caption: Benchmark
:maxdepth: 1

benchmark.rst


Index
-----

* :ref:`genindex`
63 changes: 63 additions & 0 deletions hpc/2021.2/_sources/overview.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
..
Copyright 2019 - 2021 Xilinx, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
.. _overview:

.. toctree::
:hidden:

Requirements
------------

Software Platform
~~~~~~~~~~~~~~~~~

This library is designed to work with Vitis 2021.2 and later, and therefore inherits the system requirements of Vitis and XRT.

Supported operating systems are RHEL/CentOS 7.4, 7.5 and Ubuntu 16.04.4 LTS, 18.04.1 LTS.
With CentOS/RHEL 7.4 and 7.5, C++11/C++14 should be enabled via
`devtoolset-6 <https://www.softwarecollections.org/en/scls/rhscl/devtoolset-6/>`_.

PCIE Accelerator Card
~~~~~~~~~~~~~~~~~~~~~

Hardware modules and kernels are designed to work with Alveo U280 and U250 cards.

License
-------

Licensed using the `Apache 2.0 license <https://www.apache.org/licenses/LICENSE-2.0>`_.

Copyright 2019 - 2021 Xilinx, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Trademark Notice
----------------

Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Zynq, and
other designated brands included herein are trademarks of Xilinx in the
United States and other countries. All other trademarks are the property
of their respective owners.
Loading

0 comments on commit 92e9eb9

Please sign in to comment.