CHANGELOG

v 1.3 (06/10/23)

* Move second half of onedim_fseries_kernel() to GPU (with a simple heuristic 
  basing on nf1 to switch between the CPU and the GPU version).
* Melody fixed bug in MAX_NF being 0 due to typecasting 1e11 to int (thanks
  Elliot Slaughter for catching that).
* Melody fixed kernel eval so done w*d not w^d times, speeds up 2d a little, 3d
  quite a lot! (PR#130)
* Melody added 1D support for both types 1 (GM-sort and SM methods) 2 (GM-sort),
  in C++/CUDA and their test executables (but not Python interface).
* Various fixes to package config.
* Miscellaneous bug fixes.

v 1.2 (02/17/21)

* Warning: Following are Python interface changes -- not backwards compatible
  with v 1.1 (See examples/example2d1,2many.py for updated usage)

    - Made opts a kwarg dict instead of an object:
         def __init__(self, ... , opts=None, dtype=np.float32)
      => def __init__(self, ... , dtype=np.float32, **kwargs)
    - Renamed arguments in plan creation `__init__`:
         ntransforms => n_trans, tol => eps
    - Changed order of arguments in plan creation `__init__`:
         def __init__(self, ... ,isign, eps, ntransforms, opts, dtype)
      => def __init__(self, ... ,ntransforms, eps, isign, opts, dtype)
    - Removed M in `set_pts` arguments:
         def set_pts(self, M, kx, ky=None, kz=None)
      => def set_pts(self, kx, ky=None, kz=None)

* Python: added multi-gpu support (in beta)
* Python: added more unit tests (wrong input, kwarg args, multi-gpu)
* Fixed various memory leaks
* Added index bound check in 2D spread kernels (Spread_2d_Subprob(_Horner))
* Added spread/interp tests to `make check`
* Fixed user request tolerance (eps) to kernel width (w) calculation
* Default kernel evaluation method set to 0, ie exp(sqrt()), since faster
* Removed outdated benchmark codes, cleaner spread/interp tests

v 1.1 (09/22/20)

* Python: extended the mode tuple to 3D and reorder from C/python
  ndarray.shape style input (nZ, nY, nX) to to the (F) order expected by the
  low level library (nX, nY, nZ).
* Added bound checking on the bin size
* Dual-precision support of spread/interp tests
* Improved documentation of spread/interp tests
* Added dummy call of cuFFTPlan1d to avoid timing the constant cost of cuFFT
  library.
* Added heuristic decision of maximum batch size (number of vectors with the
  same nupts to transform at the same time)
* Reported execution throughput in the test codes
* Fixed timing in the tests code
* Professionalized handling of too-small-eps (requested tolerance)
* Rewrote README.md and added cuFINUFFT logo.
* Support of advanced Makefile usage, e.g. make -site=olcf_summit
* Removed FFTW dependency

v 1.0 (07/29/20)