Skip to content

Commit

Permalink
allow for conda package distribution
Browse files Browse the repository at this point in the history
  • Loading branch information
yumengch committed Jan 25, 2024
1 parent 32ac92a commit 63bd275
Show file tree
Hide file tree
Showing 13 changed files with 175 additions and 123 deletions.
37 changes: 24 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,30 @@ A Python interface to the Fortran-written data assimilation library - [PDAF](htt
![GitHub Workflow Status](https://github.com/yumengch/pyPDAF/actions/workflows/test_build.yaml/badge.svg)


## Prerequisite:
- `Fortran compiler: e.g.:gfortran/intel fortran`
- `a message passing interface (MPI) implementation: e.g. openMPI/MPICH`
- `Python>=3.8`


## Installation:
- pyPDAF uses `[PDAF V2.1](https://github.com/PDAF/PDAF/tree/PDAF_V2.1)` which can be obtained by:
`git submodule update --init --recursive`
- Currently, Fortran-written PDAF is compiled together with pyPDAF. Hence, the Fortran compiler options need to be specified in the PDAF section of [`setup.cfg`](setup.cfg).
- Options in pyPDAF section of `setup.cfg` are related to the current pyPDAF directory (`pwd`) and C compiler used by Cython, e.g. (`CC=mpicc` for GNU compiler or `CC=mpiicc` for Intel compiler)
- It is recommended to use a clean conda environment to install pyPDAF to avoid any package conflicts
- Install Python package: ```pip install .```
There are two ways of installing pyPDAF.
- The easiest approach is using `conda`. Currently, `pyPDAF` is available from `conda` for `Windows`, `Linux` and `MacOS (arm64)`. The installation can be obtained via:
```bash
conda create -n pyPDAF -c yumengch -c conda-forge pyPDAF
```
You can start to use `pyPDAF` by `conda activate pyPDAF`.
- In HPC or cluster environment, it might not be desirable to use compilers and MPI implementation provided by conda. In this case, pyPDAF can be installed from source
```bash
git clone https://github.com/yumengch/pyPDAF.git
cd pyPDAF
git submodule update --init --recursive
pip install -v .
```
The `pip` command compiles both `PDAF V2.1` and its C interface. To customise the compiler options with the local machine, it is necessary to specify the compiler, compiler options, path to the dependent libraries. In our case, the dependent library is `BLAS`, `LAPACK`, and `MPI` implementation.
- The installation requires `Cython`, `mpi4py`, and `numpy` package.
- The Fortran compiler options need to be specified in the PDAF section of [`setup.cfg`](setup.cfg). Note that the `-fPIC` compiler option is required to create a Python package. Note that these are only relevant on non-Windows machines. For Windows machines, `MSVC` and `Intel Fortran compilers` are used by default and adaptations for other compilers will need changes in `CMakeLists.txt` in [PDAFBuild/CMakeLists.txt](PDAFBuild/CMakeLists.txt) and [pyPDAF/fortran/CMakeLists.txt](pyPDAF/fortran/CMakeLists.txt).
- Options in pyPDAF section of `setup.cfg` requires the following options:
- `pwd` is the absolute path to the pyPDAF repository directory
- `CC` is the C compiler used by Cython, e.g. `CC=mpicc` for GNU compiler or `CC=mpiicc` for Intel compiler. This option is not usable in Windows as only `MSVC` is supported.
- `condaBuild` -- ignore this option as is only relevant for `conda build` scenario
- `useMKL` decides if you use Intel's Math Kernel Library (MKL). If `True` is given, `MKLROOT` must be specified which is the absolute path to the static MKL library
- `LAPACK_PATH` and `LAPACK_LIBRARY` is the path to the BLAS and LAPACK directory and the linking flag respectively. They can be delimited by `,`. For example, we can have `LAPACK_LIBRARY=blas,lapack`. Do not give `-lblas` as `setuptools` deal with the format to the linker.
- `MPI_INC_PATH`, `MPI_MOD_PATH`, and `MPI_LIB_PATH` are only relevant in Windows, which is the path to `.h` file, `.f90` file, and `.lib` file respectively. These paths are usually `C:\Program Files (x86)\Microsoft SDKs\MPI\Include\x64`, `C:\Program Files (x86)\Microsoft SDKs\MPI\Include`, and `C:\Program Files (x86)\Microsoft SDKs\MPI\Lib\x64` respectively.

## Run example:
```bash
Expand All @@ -32,6 +43,6 @@ Currently, it interfaces with subroutines of ```PDAF-V2.1``` with an example for
## Contributors:
Yumeng Chen, Lars Nerger

pyPDAF is mainly developed and maintainde by National Centre for Earth Observation and University of Reading.
pyPDAF is mainly developed and maintained by National Centre for Earth Observation and University of Reading.

<img src="https://github.com/nansencenter/DAPPER/blob/master/docs/imgs/UoR-logo.png?raw=true" height="50" /> <img src="https://github.com/nansencenter/DAPPER/blob/master/docs/imgs/nceologo1000.png?raw=true" height="50">
3 changes: 3 additions & 0 deletions conda.recipe/bld.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 vs2022

%PYTHON -m pip install . --no-deps --ignore-installed --no-cache-dir -vvv
1 change: 1 addition & 0 deletions conda.recipe/build.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env bash
set -ex


# Install the Python package, but without dependencies,
# because Conda takes care of that
$PYTHON -m pip install . --no-deps --ignore-installed --no-cache-dir -vvv
7 changes: 7 additions & 0 deletions conda.recipe/conda_build_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,10 @@ python:
- 3.9
- 3.10
- 3.11

c_compiler:
- vs2022 # [win]

mpi:
- mpich # [not win]
- msmpi # [win]
11 changes: 7 additions & 4 deletions conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,16 @@ requirements:
- pip
- setuptools
- numpy
- blas-devel
- liblapack
- mpi4py
- {{ mpi }}
- mkl-static # [x86]
- blas-devel # [not x86]
- liblapack # [not x86]
build:
- make # [not win]
- {{ compiler('c') }}
- {{ compiler('fortran') }} # [not win]
- cmake
run:
- python
- {{ pin_compatible('numpy') }}
Expand All @@ -36,5 +39,5 @@ about:
description: |
pyPDAF is a python interface to the Fortran-based PDAF library
license: GPL
doc_url: https://github.com/BoldingBruggeman/eat/wiki
dev_url: https://github.com/BoldingBruggeman/eat
doc_url: https://yumengch.github.io/pyPDAF/index.html
dev_url: https://github.com/yumengch/pyPDAF
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to pyPDAF's documentation!
==================================
pyPDAF - A Python interface to Parallel Data Assimilation Framework
===================================================================
.. include:: introduction.rst

.. toctree::
Expand Down
37 changes: 21 additions & 16 deletions docs/source/install.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
# Installation

## Prerequisite:
- `Fortran compiler: e.g.:gfortran/intel fortran`
- `a message passing interface (MPI) implementation: e.g. openMPI/MPICH/MS-MPI`
- `BLAS and LAPACK installation or Intel MKL library compatiable with the Fortran compiler`
- `Python>=3.8`

---
**NOTE**
- pyPDAF uses [MPI4py](https://mpi4py.readthedocs.io/en/stable/). The MPI4py and the compile-time MPI should use the same MPI implementations to avoid any issues. To specify the MPI implementation for MPI4py, the following method can be used:
There are two ways of installing pyPDAF.
- The easiest approach is using `conda`. Currently, `pyPDAF` is available from `conda` for `Windows`, `Linux` and `MacOS (arm64)`. The installation can be obtained via:
```bash
export CC=/path/to/mpicc python
env MPICC=/path/to/mpicc python -m pip install mpi4py
conda create -n pyPDAF -c yumengch -c conda-forge pyPDAF
```
---

## Install pyPDAF:
- First, provide path to the compiler and libraries in `setup.cfg`
- ```pip install -e .```
You can start to use `pyPDAF` by `conda activate pyPDAF`.
- In HPC or cluster environment, it might not be desirable to use compilers and MPI implementation provided by conda. In this case, pyPDAF can be installed from source
```bash
git clone https://github.com/yumengch/pyPDAF.git
cd pyPDAF
git submodule update --init --recursive
pip install -v .
```
The `pip` command compiles both `PDAF V2.1` and its C interface. To customise the compiler options with the local machine, it is necessary to specify the compiler, compiler options, path to the dependent libraries. In our case, the dependent library is `BLAS`, `LAPACK`, and `MPI` implementation.
- The installation requires `Cython`, `mpi4py`, and `numpy` package.
- The Fortran compiler options need to be specified in the PDAF section of [`setup.cfg`](setup.cfg). Note that the `-fPIC` compiler option is required to create a Python package. Note that these are only relevant on non-Windows machines. For Windows machines, `MSVC` and `Intel Fortran compilers` are used by default and adaptations for other compilers will need changes in `CMakeLists.txt` in [PDAFBuild/CMakeLists.txt](PDAFBuild/CMakeLists.txt) and [pyPDAF/fortran/CMakeLists.txt](pyPDAF/fortran/CMakeLists.txt).
- Options in pyPDAF section of `setup.cfg` requires the following options:
- `pwd` is the absolute path to the pyPDAF repository directory
- `CC` is the C compiler used by Cython, e.g. `CC=mpicc` for GNU compiler or `CC=mpiicc` for Intel compiler. This option is not usable in Windows as only `MSVC` is supported.
- `condaBuild` -- ignore this option as is only relevant for `conda build` scenario
- `useMKL` decides if you use Intel's Math Kernel Library (MKL). If `True` is given, `MKLROOT` must be specified which is the absolute path to the static MKL library
- `LAPACK_PATH` and `LAPACK_LIBRARY` is the path to the BLAS and LAPACK directory and the linking flag respectively. They can be delimited by `,`. For example, we can have `LAPACK_LIBRARY=blas,lapack`. Do not give `-lblas` as `setuptools` deal with the format to the linker.
- `MPI_INC_PATH`, `MPI_MOD_PATH`, and `MPI_LIB_PATH` are only relevant in Windows, which is the path to `.h` file, `.f90` file, and `.lib` file respectively. These paths are usually `C:\Program Files (x86)\Microsoft SDKs\MPI\Include\x64`, `C:\Program Files (x86)\Microsoft SDKs\MPI\Include`, and `C:\Program Files (x86)\Microsoft SDKs\MPI\Lib\x64` respectively.
6 changes: 4 additions & 2 deletions docs/source/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
pyPDAF
======

pyPDAF is a Python interface to the `Parallel Data Assimilation Framwork (PDAF) <http://pdaf.awi.de/trac/wiki>`_ library written in Fortran. The latest pyPDAF supports PDAF-V2.0.
pyPDAF is a Python interface to the `Parallel Data Assimilation Framwork (PDAF) <http://pdaf.awi.de/trac/wiki>`_ library written in Fortran. The latest pyPDAF supports PDAF-V2.1.

With a variety of packages in Python, it allows a simpler coding style for user-supplied functions, such as I/O of observations and post-processing. It can also benefit many Python-based numerical models with parallel and efficient data assimilation capability.
With a variety of packages in Python, it allows a simpler coding style for user-supplied functions, such as I/O of observations and post-processing. This is helpful for prototyping data assimilation systems, offline data assimilation systems. It can also benefit many Python-based numerical models, or models that can be interfaced with Python, with parallel and efficient data assimilation capability.

The core DA algorithm is as efficient as Fortran implementation in the interface. The efficiency of the Python-based user supplied functions can be improved if sufficient optimisations are used.
10 changes: 10 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,15 @@ CPP_DEFS = -DUSE_PDAF
[pyPDAF]
pwd = /home/runner/work/pyPDAF/pyPDAF/
CC = mpicc
condaBuild =
# if MKL is used, give the path to the static MKL library
use_MKL=
MKLROOT=
# if dynamic/shared liblapack and libblas library is used,
# give the library path and flags
LAPACK_PATH=
LAPACK_Flag=lapack,blas
# GIVE MPI information
MPI_INC_PATH=
MPI_MOD_PATH=
MPI_LIB_PATH=
130 changes: 72 additions & 58 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
if not os.path.isabs(PDAFdir):
PDAFdir = os.path.join(pwd, PDAFdir)
print ('input PDAF directory is not absolute path, changing to: ', PDAFdir)

# set up C compiler for cython and Python
if os.name == 'nt':
compiler = 'msvc'
Expand All @@ -67,78 +68,91 @@
else:
print ('....using GNU compiler....')

# compiler options for cython
condaBuild = dist.get_option_dict('pyPDAF')['condaBuild'][1]

extra_compile_args=[]
extra_link_args = []
extra_objects = []
library_dirs=[]
libraries = []

# compiler options for cython
if compiler == 'gnu':
extra_compile_args+=['-Wno-unreachable-code-fallthrough']
# linking static PDAF library and interface objects
extra_objects = []
if sys.platform == 'darwin':
extra_objects+=['-Wl,-force_load', f'{PDAFdir}/lib/libpdaf-var.a',
'-Wl,-force_load', f'{pwd}/lib/libPDAFc.a',]
elif os.name != 'nt':
extra_objects+=['-Wl,--whole-archive', f'{PDAFdir}/lib/libpdaf-var.a',
f'{pwd}/lib/libPDAFc.a', '-Wl,--no-whole-archive']

if compiler == 'intel':
MKLROOT=dist.get_option_dict('pyPDAF')['MKLROOT'][1]
extra_objects+=['-Wl,--start-group',
f'{MKLROOT}/lib/intel64/libmkl_intel_lp64.a',
f'{MKLROOT}/lib/intel64/libmkl_sequential.a',
f'{MKLROOT}/lib/intel64/libmkl_core.a',
'-Wl,--end-group']
# linking static PDAF library and interface objects
if os.name == 'nt':
library_dirs+=[os.path.join(PDAFdir, 'lib', 'Release'),
os.path.join(pwd, 'pyPDAF', 'fortran', 'build', 'Release'),
]
libraries += ['pdaf-var', 'pdafc']
else:
if sys.platform == 'darwin':
extra_objects+=['-Wl,-force_load', f'{PDAFdir}/lib/libpdaf-var.a',
'-Wl,-force_load', f'{pwd}/lib/libPDAFc.a',]
else:
extra_objects+=['-Wl,--whole-archive', f'{PDAFdir}/lib/libpdaf-var.a',
f'{pwd}/lib/libPDAFc.a', '-Wl,--no-whole-archive']

# PDAF library contains multiple same .o files
# multiple-definition is thus necessary
extra_link_args = []
# setup library to MPI-fortran
LAPACK_PATH=dist.get_option_dict('pyPDAF')['LAPACK_PATH'][1]
print ('LAPACK_PATH', LAPACK_PATH)
library_dirs=[]
if LAPACK_PATH != '': library_dirs += LAPACK_PATH.split(',')
# add mpi library path
if os.name != 'nt':
if compiler == 'intel':
result = subprocess.run(['mpiifort', '-show'], stdout=subprocess.PIPE)
else:
result = subprocess.run(['mpifort', '-show'], stdout=subprocess.PIPE)
if os.name == 'nt':
# always use external msmpi as msmpi from conda cannot be linked
MPI_LIB_PATH=dist.get_option_dict('pyPDAF')['MPI_LIB_PATH'][1]
if MPI_LIB_PATH != '': library_dirs += MPI_LIB_PATH.split(',')
libraries += ['msmpi', 'msmpifec']
else:
mpifortran = 'mpiifort' if compiler == 'intel' else 'mpifort'
result = subprocess.run([mpifortran, '-show'], stdout=subprocess.PIPE)
result = result.stdout.decode()[:-1].split(' ')
s = [l[2:].replace('"', '') for l in result if l[:2] == '-L']
if len(s) > 0: library_dirs += s
# add gfortran library path
if sys.platform == 'darwin':
result = subprocess.run(['gfortran', '--print-file', 'libgfortran.dylib'], stdout=subprocess.PIPE)
result = result.stdout.decode()[:-18]
s = [l[2:] for l in result if l[:2] == '-l']
if len(s) > 0: libraries += s

# linking BLAS/LAPACK
use_MKL=dist.get_option_dict('pyPDAF')['use_MKL'][1]
if use_MKL == 'True':
if condaBuild == 'True':
MKLROOT = os.environ['LIBRARY_LIB'] if os.name == 'nt' else \
os.path.join(os.environ['PREFIX'], 'lib')
else:
result = subprocess.run(['gfortran', '--print-file', 'libgfortran.so'], stdout=subprocess.PIPE)
result = result.stdout.decode()[:-15]
library_dirs+=[result,]
library_dirs+=['/usr/lib', ]
MKLROOT = dist.get_option_dict('pyPDAF')['MKLROOT'][1]
assert MKLROOT != '', 'MKLROOT must not be empty, check setup.cfg file'
if os.name == 'nt':
library_dirs+=[MKLROOT,]
libraries += ['mkl_core', 'mkl_sequential', 'mkl_intel_lp64']
else:
extra_objects+=['-Wl,--start-group',
f'{MKLROOT}/libmkl_intel_lp64.a',
f'{MKLROOT}/libmkl_sequential.a',
f'{MKLROOT}/libmkl_core.a',
'-Wl,--end-group']
else:
library_dirs+=[os.path.join(PDAFdir, 'lib', 'Release'),
os.path.join(pwd, 'pyPDAF', 'fortran', 'build', 'Release'),
]
print ('library_dirs', library_dirs)
# setup library to MPI-fortran
LAPACK_PATH=dist.get_option_dict('pyPDAF')['LAPACK_PATH'][1]
if LAPACK_PATH != '': library_dirs += LAPACK_PATH.split(',')
LAPACK_Flag=dist.get_option_dict('pyPDAF')['LAPACK_Flag'][1]
print ('LAPACK_Flag', LAPACK_Flag)
if LAPACK_Flag != '': libraries += LAPACK_Flag.split(',')

# add fortran library to the linking
if os.name != 'nt':
if compiler == 'intel':
# somehow gfortran is always necessary
libraries = ['ifcore', 'ifcoremt', 'gfortran', 'm']
else:
libraries=['gfortran', 'm']
suffix = 'dylib' if sys.platform == 'darwin' else 'so'
FC = os.environ['FC'] if condaBuild == 'True' else 'gfortran'
result = subprocess.run([FC, '--print-file',
'libgfortran.'+suffix], stdout=subprocess.PIPE)
result = result.stdout.decode()
result = result[:-18] if sys.platform == 'darwin' else result[:-15]
library_dirs+=[result,]
library_dirs+=['/usr/lib', ]
# somehow gfortran is always necessary
libraries += ['gfortran', 'm']
if compiler == 'intel': libraries += ['ifcore', 'ifcoremt']

if compiler == 'intel':
result = subprocess.run(['mpiifort', '-show'], stdout=subprocess.PIPE)
else:
result = subprocess.run(['mpifort', '-show'], stdout=subprocess.PIPE)
result = result.stdout.decode()[:-1].split(' ')
s = [l[2:] for l in result if l[:2] == '-l']
if len(s) > 0: libraries += s
else:
libraries = ['msmpi', 'msmpifec', 'pdaf-var', 'pdafc', 'mkl_core', 'mkl_sequential', 'mkl_intel_lp64']
LAPACK_Flag=dist.get_option_dict('pyPDAF')['LAPACK_Flag'][1]
print ('LAPACK_Flag', LAPACK_Flag)
if LAPACK_Flag != '': libraries += LAPACK_Flag.split(',')
print ('extra_compile_args', extra_compile_args)
print ('extra_link_args', extra_link_args)
print ('extra_objects', extra_objects)
print ('library_dirs', library_dirs)
print ('libraries', libraries)

def compilePDAFLibraryInterface():
Expand Down
22 changes: 0 additions & 22 deletions setup_intel.cfg

This file was deleted.

16 changes: 13 additions & 3 deletions setup_mac.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,24 @@ LD = mpif90
AR = ar
RANLIB = ranlib
CPP = /usr/bin/cpp
OPT = -O3 -fdefault-real-8 -fPIC
OPT = -O3 -fdefault-real-8 -fPIC -mmacosx-version-min=10.6
OPT_LNK =
INC = -IPDAF_V2.1/include
LINK_LIBS = -llapack -lblas
LINK_LIBS = -llapack -lblas
CPP_DEFS = -DUSE_PDAF

[pyPDAF]
pwd = /Users/runner/work/pyPDAF/pyPDAF/
CC = mpicc
condaBuild =
# if MKL is used, give the path to the static MKL library
use_MKL=
MKLROOT=
# if dynamic/shared liblapack and libblas library is used,
# give the library path and flags
LAPACK_PATH=
LAPACK_Flag=lapack,blas
LAPACK_Flag=lapack,blas
# GIVE MPI information
MPI_INC_PATH=
MPI_MOD_PATH=
MPI_LIB_PATH=
Loading

0 comments on commit 63bd275

Please sign in to comment.