Chameleon is written in C and depends on a couple of external libraries that must be installed on the system.
Chameleon can be built and installed by the standard means of CMake. General information about CMake, as well as installation binaries and CMake source code are available from here.
To get support to install a full distribution Chameleon + dependencies we encourage users to use the morse branch of Spack.
The latest official release tarballs of Chameleon sources are available for download from the gitlab tags page.
The latest development state is available on gitlab. You need Git
git clone --recursive https://gitlab.inria.fr/solverstack/chameleon.git
To install Chameleon’s libraries, header files, and executables, one needs:
- CMake (version 2.8 minimum): the build system
- C and Fortran compilers: GNU compiler suite, Clang, Intel or IBM can be used
- python: to generate files in the different precisions
- external libraries: this depends on the configuration, by default
the required libraries are
- StarPU
- CBLAS, LAPACKE: these are interfaces and there exist several
providers that can be used with Chameleon
- Intel MKL, Netlib, OpenBlas
- BLAS, LAPACK, TMGLIB: there exist several providers that can be
used with Chameleon
- Eigen, Intel MKL, Netlib, OpenBlas
- pthread (libpthread)
- math (libm)
Optional libraries:
These packages must be installed on the system before trying to configure/build chameleon. Please look at the distrib/ directory which gives some hints for the installation of dependencies for Unix systems.
We give here some examples for a Debian system:
# Update Debian packages list sudo apt-get update # Install Netlib blas, lapack, tmglib, cblas and lapacke suite sudo apt-get install -y liblapack-dev liblapacke-dev # Alternatively to Netlib, OpenBLAS could be used (faster kernels) sudo apt-get install -y libopenblas-dev liblapacke-dev # Install OpenMPI sudo apt-get install -y libopenmpi-dev # Install hwloc (used by StarPU or QUARK, already a dependency of OpenMPI) sudo apt-get install -y libhwloc-dev # install FxT, usefull to export some nice execution traces with StarPU sudo apt-get install -y libfxt-dev # Install cuda and cuBLAS: only if you have a GPU cuda compatible sudo apt-get install -y nvidia-cuda-toolkit nvidia-cuda-dev # Install StarPU (with MPI and FxT enabled) mkdir -p $HOME/install cd $HOME/install wget http://starpu.gforge.inria.fr/files/starpu-1.2.2/starpu-1.2.2.tar.gz tar xvzf starpu-1.2.2.tar.gz cd starpu-1.2.2/ ./configure --prefix=$HOME/install/starpu --disable-opencl --disable-cuda --with-fxt=/usr/lib/x86_64-linux-gnu/ make make install cd $HOME/install rm starpu-1.2.2/ starpu-1.2.2.tar.gz -rf # Install QUARK: to be used in place of StarPU mkdir -p $HOME/install cd $HOME/install wget http://icl.cs.utk.edu/projectsfiles/quark/pubs/quark-0.9.0.tgz tar xvzf quark-0.9.0.tgz cd quark-0.9.0/ sed -i -e "s#prefix=\.\/install#prefix=$HOME/install/quark#g" make.inc sed -i -e "s#CFLAGS=-O2#CFLAGS=-O2 -fPIC#g" make.inc make make install cd $HOME/install rm quark-0.9.0/ quark-0.9.0.tgz -rf
BLAS (Basic Linear Algebra Subprograms), are a de facto standard for basic linear algebra operations such as vector and matrix multiplication. FORTRAN implementation of BLAS is available from Netlib. Also, C implementation of BLAS is included in GSL (GNU Scientific Library). Both these implementations are reference implementation of BLAS, are not optimized for modern processor architectures and provide an order of magnitude lower performance than optimized implementations. Highly optimized implementations of BLAS are available from many hardware vendors, such as Intel MKL, IBM ESSL and AMD ACML. Fast implementations are also available as academic packages, such as ATLAS and OpenBLAS. The standard interface to BLAS is the FORTRAN interface.
Caution about the compatibility: Chameleon has been mainly tested with the reference BLAS from NETLIB, OpenBLAS and Intel MKL.
CBLAS is a C language interface to BLAS. Most commercial and academic implementations of BLAS also provide CBLAS. Netlib provides a reference implementation of CBLAS on top of FORTRAN BLAS (Netlib CBLAS). Since GSL is implemented in C, it naturally provides CBLAS.
Caution about the compatibility: Chameleon has been mainly tested with the reference CBLAS from NETLIB, OpenBLAS and Intel MKL.
LAPACK (Linear Algebra PACKage) is a software library for numerical linear algebra, a successor of LINPACK and EISPACK and a predecessor of Chameleon. LAPACK provides routines for solving linear systems of equations, linear least square problems, eigenvalue problems and singular value problems. Most commercial and academic BLAS packages also provide some LAPACK routines.
Caution about the compatibility: Chameleon has been mainly tested with the reference LAPACK from NETLIB, OpenBLAS and Intel MKL.
LAPACKE is a C language interface to LAPACK (or CLAPACK). It is produced by Intel in coordination with the LAPACK team and is available in source code from Netlib in its original version (Netlib LAPACKE) and from Chameleon website in an extended version (LAPACKE for Chameleon). In addition to implementing the C interface, LAPACKE also provides routines which automatically handle workspace allocation, making the use of LAPACK much more convenient.
Caution about the compatibility: Chameleon has been mainly tested with the reference LAPACKE from NETLIB, OpenBLAS and Intel MKL.
libtmg is a component of the LAPACK library, containing routines for generation of input matrices for testing and timing of LAPACK. The testing and timing suites of LAPACK require libtmg, but not the library itself. Note that the LAPACK library can be built and used without libtmg.
Caution about the compatibility: Chameleon has been mainly tested with the reference TMGLIB from NETLIB, OpenBLAS and Intel MKL.
QUARK (QUeuing And Runtime for Kernels) provides a library that enables the dynamic execution of tasks with data dependencies in a multi-core, multi-socket, shared-memory environment. One of QUARK or StarPU Runtime systems has to be enabled in order to schedule tasks on the architecture. If QUARK is enabled then StarPU is disabled and conversely. Note StarPU is enabled by default. When Chameleon is linked with QUARK, it is not possible to exploit neither CUDA (for GPUs) nor MPI (distributed-memory environment). You can use StarPU to do so.
Caution about the compatibility: Chameleon has been mainly tested with the QUARK library 0.9.
StarPU is a task programming library for hybrid architectures. StarPU handles run-time concerns such as:
- Task dependencies
- Optimized heterogeneous scheduling
- Optimized data transfers and replication between main memory and discrete memories
- Optimized cluster communications
StarPU can be used to benefit from GPUs and distributed-memory environment. One of QUARK or StarPU runtime system has to be enabled in order to schedule tasks on the architecture. If StarPU is enabled then QUARK is disabled and conversely. Note StarPU is enabled by default.
Caution about the compatibility: Chameleon has been mainly tested with StarPU-1.1 and 1.2 releases.
FxT stands for both FKT (Fast Kernel Tracing) and FUT (Fast User Tracing). This library provides efficient support for recording traces. Chameleon can trace kernels execution on the different workers and produce .paje files if FxT is enabled. FxT can only be used through StarPU and StarPU must be compiled with FxT enabled, see how to use this feature here Execution trace using StarPU.
Caution about the compatibility: FxT should be compatible with the version of StarPU used.
hwloc (Portable Hardware Locality) is a software package for
accessing the topology of a multicore system including components
like: cores, sockets, caches and NUMA nodes. The topology
discovery library, hwloc
, is not mandatory to use StarPU but
strongly recommended. It allows to increase performance, and to
perform some topology aware scheduling. hwloc
is available in
major distributions and for most OSes and can be downloaded from
http://www.open-mpi.org/software/hwloc.
POSIX threads library is required to run Chameleon on Unix-like systems. It is a standard component of any such system.
OpenMPI is an open source Message Passing Interface implementation for execution on multiple nodes with distributed-memory environment. MPI can be enabled only if the runtime system chosen is StarPU (default). To use MPI through StarPU, it is necessary to compile StarPU with MPI enabled.
Caution about the compatibility: OpenMPI should be built with the –enable-mpi-thread-multiple option.
Nvidia CUDA Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. Chameleon can use a set of low level optimized kernels coming from cuBLAS to accelerate computations on GPUs. The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the Nvidia CUDA runtime. cuBLAS is normaly distributed with Nvidia CUDA Toolkit. CUDA/cuBLAS can be enabled in Chameleon only if the runtime system chosen is StarPU (default). To use CUDA through StarPU, it is necessary to compile StarPU with CUDA enabled.
Caution about the compatibility: Chameleon has been mainly tested with CUDA releases from versions 4 to 7.5. Your compiler must be compatible with CUDA.
<sec:spack>
To get support to install a full distribution (Chameleon + dependencies) we encourage users to use the morse branch of Spack.
Please read these documentations:
git clone https://github.com/solverstack/spack.git
. ./spack/share/spack/setup-env.sh
spack install -v chameleon
# chameleon is installed here:
`spack location -i chameleon`
Compilation of Chameleon libraries and executables are done with CMake (http://www.cmake.org/). This version has been tested with CMake 3.5.1 but any version superior to 2.8 should be fine.
Here the steps to configure, build, test and install
- configure:
cmake path/to/chameleon -DOPTION1= -DOPTION2= ... # see the "Configuration options" section to get list of options # see the "Dependencies detection" for details about libraries detection
- build:
make # do not hesitate to use -j[ncores] option to speedup the compilation
- test (optional, required CHAMELEON_ENABLE_TESTING=ON and/or
CHAMELEON_ENABLE_TIMING=ON):
make test # or ctest
- install (optional):
make install
Do not forget to specify the install directory with -DCMAKE_INSTALL_PREFIX at configure.
cmake /home/jdoe/chameleon -DCMAKE_INSTALL_PREFIX=/home/jdoe/install/chameleon
Note that the install process is optional. You are free to use Chameleon binaries compiled in the build directory.
You can optionally activate some options at cmake configure (like CUDA, MPI, …)
invoking cmake path/to/your/CMakeLists.txt -DOPTION1= -DOPTION2= ...
cmake /home/jdoe/chameleon/ -DCMAKE_BUILD_TYPE=Debug \ -DCMAKE_INSTALL_PREFIX=/home/jdoe/install/ \ -DCHAMELEON_USE_CUDA=ON \ -DCHAMELEON_USE_MPI=ON \ -DBLA_VENDOR=Intel10_64lp \ -DSTARPU_DIR=/home/jdoe/install/starpu-1.2/ \ -DCHAMELEON_ENABLE_TRACING=ON
You can get the full list of options with -L[A][H] options of cmake command
cmake -LH /home/jdoe/chameleon/
You can also set the options thanks to the ccmake interface.
- CMAKE_BUILD_TYPE=Debug|Release|RelWithDebInfo|MinSizeRel: level of compiler optimization, enable/disable debug information
- CMAKE_INSTALL_PREFIX=path/to/your/install/dir: where headers, libraries, executables, etc, will be copied when invoking make install
- BUILD_SHARED_LIBS=ON|OFF: indicate wether or not CMake has to
build Chameleon static (
OFF
) or shared (ON
) libraries. - CMAKE_C_COMPILER=gcc|icc|…: to choose the C compilers if several exist in the environment
- CMAKE_Fortran_COMPILER=gfortran|ifort|…: to choose the Fortran compilers if several exist in the environment
- BLA_VENDOR=All|Eigen|Open|Generic|Intel10_64lp|Intel10_64lp_seq: to use intel mkl for example, see the list of BLA_VENDOR in FindBLAS.cmake in cmake_modules/morse_cmake/modules/find
- STARPU_DIR=path/to/root/starpu/install, see Dependencies detection
- STARPU_INCDIR=path/to/root/starpu/install/headers, see Dependencies detection
- STARPU_LIBDIR=path/to/root/starpu/install/libs, see Dependencies detection
- List of packages that can searched just like STARPU (with _DIR,
_INCDIR and _LIBDIR):
- BLAS, CBLAS, EZTRACE, FXT, HWLOC, LAPACK, LAPACKE, QUARK, SIMGRID, TMG
Libraries detected with an official cmake module (see module files in CMAKE_ROOT/Modules/): CUDA - MPI - Threads.
Libraries detected with our cmake modules (see module files in cmake_modules/morse_cmake/modules/find/ directory of Chameleon sources): BLAS - CBLAS - EZTRACE - FXT - HWLOC - LAPACK - LAPACKE - QUARK - SIMGRID - STARPU - TMG.
- CHAMELEON_SCHED_STARPU=ON|OFF (default ON): to link with StarPU library (runtime system)
- CHAMELEON_SCHED_QUARK=ON|OFF (default OFF): to link with QUARK library (runtime system)
- CHAMELEON_USE_MPI=ON|OFF (default OFF): to link with MPI library (message passing implementation for use of multiple nodes with distributed memory), can only be used with StarPU
- CHAMELEON_USE_CUDA=ON|OFF (default OFF): to link with CUDA runtime (implementation paradigm for accelerated codes on GPUs) and cuBLAS library (optimized BLAS kernels on GPUs), can only be used with StarPU
- CHAMELEON_ENABLE_DOC=ON|OFF (default OFF): to control build of the documentation contained in doc/ sub-directory
- CHAMELEON_ENABLE_EXAMPLE=ON|OFF (default ON): to control build of the examples executables (API usage) contained in example/ sub-directory
- CHAMELEON_ENABLE_PRUNING_STATS=ON|OFF (default OFF)
- CHAMELEON_ENABLE_TESTING=ON|OFF (default ON): to control build of testing executables (numerical check) contained in testing/ sub-directory
- CHAMELEON_ENABLE_TIMING=ON|OFF (default ON): to control build of timing executables (performances check) contained in timing/ sub-directory
- CHAMELEON_ENABLE_TRACING=ON|OFF (default OFF): to enable trace generation during execution of timing drivers. It requires StarPU to be linked with FxT library (trace execution of kernels on workers), see also Execution tracing with StarPU.
- CHAMELEON_SIMULATION=ON|OFF (default OFF): to enable simulation mode, means Chameleon will not really execute tasks, see details in section Use simulation mode with StarPU-SimGrid. This option must be used with StarPU compiled with SimGrid allowing to guess the execution time on any architecture. This feature should be used to make experiments on the scheduler behaviors and performances not to produce solutions of linear systems.
<sec:depdet>
You have different choices to detect dependencies on your system, either by setting some environment variables containing paths to the libs and headers or by specifying them directly at cmake configure. Different cases:
- detection of dependencies through environment variables:
- LD_LIBRARY_PATH should contain the list of paths where to find
the libraries:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:install/path/to/your/lib
- INCLUDE should contain the list of paths where to find the
header files of libraries
export INCLUDE=$INCLUDE:install/path/to/your/headers
- LD_LIBRARY_PATH should contain the list of paths where to find
the libraries:
- detection with user’s given paths:
- you can specify the path at cmake configure by invoking
cmake path/to/your/CMakeLists.txt -DLIB_DIR=path/to/your/lib
where LIB stands for the name of the lib to look forcmake path/to/your/CMakeLists.txt -DSTARPU_DIR=path/to/starpudir \ -DCBLAS_DIR= ...
it is also possible to specify headers and library directories separately
cmake path/to/your/CMakeLists.txt -DSTARPU_INCDIR=path/to/libstarpu/include/starpu/1.1 \ -DSTARPU_LIBDIR=path/to/libstarpu/lib
- note: BLAS and LAPACK detection can be tedious so that we provide a verbose mode you can set -DBLAS_VERBOSE=ON or -DLAPACK_VERBOSE=ON to enable it
- you can specify the path at cmake configure by invoking
- detection with custom environment variables: all variables like _DIR, _INCDIR, _LIBDIR can be set as environment variables instead of CMake options, there will be read
- using pkg-config for libraries that provide .pc files
- update your PKG_CONFIG_PATH to the paths where to find .pc files of installed external libraries like hwloc, starpu, some blas/lapack, etc