-
Notifications
You must be signed in to change notification settings - Fork 354
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Squashed 'hpc' changes from 9d08e6d~10..9d08e6d (#419)
9d08e6d Update index.html 0114189 Merge pull request #93 from RepoOps/gh-pages-20210927-032739 cd52282 [xf_hpc] update release version cdffb7b update index 83b408e Merge pull request #92 from RepoOps/gh-pages-20210927-031807 c126cc9 Update release.rst.txt b5ede6b [xf_hpc] build documents 96dd08b Merge pull request #82 from RepoOps/gh-pages-20210615-023421 b886cd4 update documents a1cf259 Merge pull request #80 from RepoOps/gh-pages-20210614-075104 fb13c8b update release notes 6095532 Merge pull request #79 from RepoOps/gh-pages-20210610-095713 8f6b9e2 fix version errors 73ab5f2 Merge pull request #78 from RepoOps/gh-pages-20210610-070616 b5d0b01 update release notes 46a85d2 Merge pull request #76 from RepoOps/gh-pages-20210608-045126 32fd3e1 build documents ceb4613 update docs Co-authored-by: sdausr <sdausr@xilinx.com>
- Loading branch information
2 people
authored and
GitHub Enterprise
committed
Sep 28, 2021
1 parent
844d608
commit 92e9eb9
Showing
177 changed files
with
104,611 additions
and
2 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
.. | ||
Copyright 2019 Xilinx, Inc. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
|
||
************ | ||
Benchmark | ||
************ | ||
|
||
.. _performance: | ||
|
||
Performance | ||
################ | ||
|
||
Conjugate Gradient Algorithm | ||
**************************** | ||
|
||
Here are benchmarks of the Vitis HPC Library using the Vitis environment and comparing results on several FPGA and CPU platforms. | ||
It supports software and hardware emulation as well as running hardware accelerators on the Alveo U250, U280 or U50. | ||
|
||
GEMV-based CG | ||
^^^^^^^^^^^^^^^^^^^^ | ||
The following table lists the resource utilization for GEMV-based CG kernel with 16 HBM channels storing the matrix. | ||
|
||
.. table:: Resource Utilization on U50 | ||
:align: center | ||
|
||
+----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+ | ||
| Name | LUT | LUTAsMem | REG | BRAM | URAM | DSP | | ||
+============================+==================+==================+===================+================+===============+================+ | ||
| User Budget | 699619 [100.00%] | 369603 [100.00%] | 1447189 [100.00%] | 1112 [100.00%] | 640 [100.00%] | 5936 [100.00%] | | ||
+----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+ | ||
| Used Resources | 186448 [ 26.65%] | 17334 [ 4.69%] | 325149 [ 22.47%] | 128 [ 11.51%] | 0 [ 0.00%] | 1262 [ 21.26%] | | ||
+----------------------------+------------------+------------------+-------------------+----------------+---------------+----------------+ | ||
|
||
|
||
.. table:: Benchmark Results on U50 | ||
:align: center | ||
|
||
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | ||
| Vector Size | Time per Iteration [ms] | U50 Performance [GFLOPS] | U50 Energy Efficiency [GFLOPS/W] | CPU Performance [GFLOPS] | Acceleration Ratio | | ||
+=============+=========================+===========================+==================================+==========================+====================+ | ||
| 1024 | 0.073 | 26.938 | 0.723 | 12.996 | 2.073 | | ||
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | ||
| 2048 | 0.2557 | 30.658 | 0.766 | 27.469 | 1.116 | | ||
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | ||
| 4096 | 0.9202 | 34.018 | 0.812 | 7.776 | 4.375 | | ||
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | ||
| 8192 | 3.405 | 36.742 | 0.839 | 8.226 | 4.467 | | ||
+-------------+-------------------------+---------------------------+----------------------------------+--------------------------+--------------------+ | ||
|
||
|
||
SPMV-based CG | ||
^^^^^^^^^^^^^^^^^^^^^^^ | ||
The following table lists the resource utilization for SPMV-based CG kernel. | ||
|
||
.. table:: Resource Utilization on U280 | ||
:align: center | ||
|
||
+----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+ | ||
| Name | LUT | LUTAsMem | REG | BRAM | URAM | DSP | | ||
+============================+===================+==================+===================+================+===============+================+ | ||
| User Budget | 1104369 [100.00%] | 552814 [100.00%] | 2217989 [100.00%] | 1693 [100.00%] | 896 [100.00%] | 9020 [100.00%] | | ||
+----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+ | ||
| Used Resources | 285372 [ 25.84%] | 36605 [ 6.62%] | 442368 [ 19.94%] | 267 [ 15.77%] | 64 [ 7.14%] | 1192 [ 13.22%] | | ||
+----------------------------+-------------------+------------------+-------------------+----------------+---------------+----------------+ | ||
|
||
.. table:: Benchmark Results on U280 | ||
:align: center | ||
|
||
|
||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| Matrix Name | Rows/Cols | NNZs | Padded Rows/Cols | Padded NNZs | Padding Ratio | No. iterations | Time per Iter [ms] | Time per Iter on CPU [ms] | Acceleration Ratio | | ||
+================+===========+=========+==================+=============+===============+================+====================+===========================+====================+ | ||
| nasa2910 | 2910 | 174296 | 2912 | 297952 | 1.70946 | 1777 | 0.0511172 | 0.0692836 | 1.36 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| ex9 | 3363 | 99471 | 3364 | 199328 | 2.00388 | 5000 | 0.0497677 | 0.0559332 | 1.12 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| bcsstk24 | 3562 | 159910 | 3564 | 222656 | 1.39238 | 5000 | 0.0598962 | 0.0581827 | 0.97 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| bcsstk15 | 3948 | 117816 | 3948 | 267488 | 2.27039 | 658 | 0.0927269 | 0.125615 | 1.35 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| bcsstk28 | 4410 | 219024 | 4412 | 319264 | 1.45767 | 4878 | 0.0586356 | 6.92198 | 118.05 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| s3rmt3m3 | 5357 | 207695 | 5360 | 330624 | 1.59187 | 5000 | 0.0744822 | 6.55229 | 87.97 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| s2rmq4m1 | 5489 | 281111 | 5492 | 427648 | 1.52128 | 1779 | 0.084562 | 6.75384 | 79.87 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| nd3k | 9000 | 3279690 | 9000 | 4277792 | 1.30433 | 5000 | 0.363479 | 4.66861 | 12.84 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| ted_B | 10605 | 144579 | 10608 | 548416 | 3.79319 | 30 | 0.984467 | 6.53108 | 6.63 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| ted_B_unscaled | 10605 | 144579 | 10608 | 548416 | 3.79319 | 16 | 1.75354 | 8.59891 | 4.90 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| msc10848 | 10848 | 1229778 | 10848 | 2050720 | 1.66755 | 5000 | 0.230942 | 5.43921 | 23.55 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| cbuckle | 13681 | 676515 | 13684 | 924832 | 1.36705 | 1282 | 0.16427 | 5.48588 | 33.40 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| olafu | 16146 | 1015156 | 16148 | 1452320 | 1.43064 | 5000 | 0.169174 | 5.05108 | 29.86 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| gyro_k | 17361 | 1021159 | 17364 | 1932384 | 1.89234 | 5000 | 0.254172 | 4.85938 | 19.12 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| bodyy4 | 17546 | 121938 | 17548 | 710112 | 5.82355 | 230 | 0.174435 | 4.73164 | 27.13 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| nd6k | 18000 | 6897316 | 18000 | 9415552 | 1.3651 | 5000 | 0.809868 | 4.25772 | 5.26 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| raefsky4 | 19779 | 1328611 | 19780 | 2268704 | 1.70758 | 5000 | 0.268956 | 4.22843 | 15.72 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
| bcsstk36 | 23052 | 1143140 | 23052 | 1833056 | 1.60353 | 5000 | 0.253049 | 3.9882 | 15.76 | | ||
+----------------+-----------+---------+------------------+-------------+---------------+----------------+--------------------+---------------------------+--------------------+ | ||
|
||
|
||
These are details for benchmark result and usage steps. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
user_guide/L2/benchmark/cg_gemv_jacobi.rst | ||
user_guide/L2/benchmark/cg_spmv_jacobi.rst | ||
|
||
Benchmark Overview | ||
################### | ||
|
||
|
||
.. _l2_vitis_hpc: | ||
|
||
Vitis HPC Library | ||
***************** | ||
|
||
* **Download code** | ||
|
||
These hpc benchmarks can be downloaded from `vitis libraries <https://github.com/Xilinx/Vitis_Libraries.git>`_ ``master`` branch. | ||
|
||
.. code-block:: bash | ||
git clone https://github.com/Xilinx/Vitis_Libraries.git | ||
cd Vitis_Libraries | ||
git checkout master | ||
cd hpc | ||
* **Setup environment** | ||
|
||
Specifying the corresponding Vitis, XRT, and path to the platform repository by running following commands. | ||
Set up Python environment with :doc:`Python environment setup guide <../pyenvguide>` | ||
|
||
.. code-block:: bash | ||
source <intstall_path>/installs/lin64/Vitis/2021.2/settings64.sh | ||
source /opt/xilinx/xrt/setup.sh | ||
export PLATFORM_REPO_PATHS=/opt/xilinx/platforms | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
.. | ||
Copyright 2019 - 2021 Xilinx, Inc. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
===================== | ||
Vitis HPC Library | ||
===================== | ||
|
||
Vitis HPC Library provides an acceleration libray for applications with high | ||
computation workload, e.g. seismic imaging and inversion, high-precision simulations, | ||
genomics and etc. Three types of components are provided in this library, | ||
namely L1 primitives, L2 kernels and L3 software APIs. These implementations are organized in their | ||
corresponding directories L1, L2 and L3. The L1 primitives' implementations can be leveraged | ||
by FPGA hardware developers. The L2 kernels' implementations provide examples for | ||
Vitis kernel developers. The L3 APIs provide C/C++ functions for software developers to offload HPC workloads. | ||
This library depends on the **Xilinx BLAS and SPARSE** library to implement some components. | ||
|
||
Because HPC applications normally have high precision requirements, the current supported data | ||
type are mainly single precision floating point type (FP32 type) and double precision floating point type (FP64 type). | ||
Although most components can be configured | ||
to support other data types, some of the architectures are specifically optimized to address | ||
FP32 operations, e.g. accumulations. | ||
|
||
In the current release, three types of applications have been addressed by this library, namely | ||
RTM (Reverse Time Migration), CG (Conjugate Gradient) method and MLP-based high precesion seismic inversion. | ||
RTM is an important seismic imaging technique used for producing an accurate representation of the subsurface. | ||
The basic computation unit of an RTM application is a stencil module, which is the essential | ||
step for explicit **FDTD (Finite Difference Time Domain)** solutions. Seismic inversion is a procedure | ||
used to reconstruct subsurface properties via the seismic reflection data. | ||
|
||
Many engineering problems, such as FEM, are eventually transformed to a group of linear systems. | ||
Conjugate Gradient method, an iterative method, is widely adopted to solve linear systems, | ||
especially those with highly sparse and large-dimention matrices. | ||
Preconditioner matrix is necessary for most of the problems in order to achieve convergent results and reduce dramatically the | ||
number of iterations, hence improves the entire performance. | ||
|
||
Modern technology uses high precision MLP (Multilayer perceptron) based neural network to speed up this process. | ||
The basic unit of a MLP application normally includes a fully connected neural (**FCN**) network and an activation | ||
function, e.g. sigmoid function. | ||
|
||
|
||
In this library, you will find the implementations of stencil module, 2D and 3D RTM forward propogation path, | ||
2D RTM application, CG solvers with Jacobi preconditioner, high-precision fully connected neural network and sigmoid activation function. | ||
|
||
|
||
Since all the kernel code is developed with the permissive Apache 2.0 license, | ||
advanced users can easily tailor, optimize or combine them for their own need. | ||
Demos and usage examples of different implementation level are also provided | ||
for reference. | ||
|
||
.. toctree:: | ||
:caption: Library Overview | ||
:maxdepth: 1 | ||
|
||
overview.rst | ||
release.rst | ||
|
||
.. toctree:: | ||
:caption: User Guide | ||
:maxdepth: 2 | ||
|
||
pyenvguide.rst | ||
user_guide/L1/L1.rst | ||
user_guide/L2/L2.rst | ||
user_guide/L3/L3.rst | ||
|
||
.. toctree:: | ||
:caption: Benchmark | ||
:maxdepth: 1 | ||
|
||
benchmark.rst | ||
|
||
|
||
Index | ||
----- | ||
|
||
* :ref:`genindex` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
.. | ||
Copyright 2019 - 2021 Xilinx, Inc. | ||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
.. _overview: | ||
|
||
.. toctree:: | ||
:hidden: | ||
|
||
Requirements | ||
------------ | ||
|
||
Software Platform | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
This library is designed to work with Vitis 2021.2 and later, and therefore inherits the system requirements of Vitis and XRT. | ||
|
||
Supported operating systems are RHEL/CentOS 7.4, 7.5 and Ubuntu 16.04.4 LTS, 18.04.1 LTS. | ||
With CentOS/RHEL 7.4 and 7.5, C++11/C++14 should be enabled via | ||
`devtoolset-6 <https://www.softwarecollections.org/en/scls/rhscl/devtoolset-6/>`_. | ||
|
||
PCIE Accelerator Card | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Hardware modules and kernels are designed to work with Alveo U280 and U250 cards. | ||
|
||
License | ||
------- | ||
|
||
Licensed using the `Apache 2.0 license <https://www.apache.org/licenses/LICENSE-2.0>`_. | ||
|
||
Copyright 2019 - 2021 Xilinx, Inc. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
|
||
Trademark Notice | ||
---------------- | ||
|
||
Xilinx, the Xilinx logo, Artix, ISE, Kintex, Spartan, Virtex, Zynq, and | ||
other designated brands included herein are trademarks of Xilinx in the | ||
United States and other countries. All other trademarks are the property | ||
of their respective owners. |
Oops, something went wrong.