Skip to content

Releases: mitsuba-renderer/drjit

Release (v1.0.4)

28 Jan 19:26
Compare
Choose a tag to compare
  • Workaround for OptiX linking issue in driver version R570+ 0c9c54e

Release (v1.0.3)

15 Jan 23:58
Compare
Choose a tag to compare

Release (v1.0.2)

14 Jan 10:59
Compare
Choose a tag to compare
  • Warning about NVIDIA drivers v565+ b5fd886
  • Support for boolean Python arguments in drjit.select d0c8811
  • Backend refactoring: vectorized calls are now also isolated per variant 17bc707
  • Fixes to dr::safe_cbrt 2f8a3ab

Release (v1.0.1)

22 Nov 23:15
Compare
Choose a tag to compare
  • Fixes to various edges cases of drjit.dda.dda() (commit 4ce97d).

Release (v1.0.0)

21 Nov 09:24
Compare
Choose a tag to compare

The 1.0 release of Dr.Jit marks a major new phase of this project. We addressed long-standing limitations and thoroughly documented every part of Dr.Jit. Due to the magnitude of the changes, some incompatibilities are unavoidable: bullet points with an exclamation mark highlight changes with an impact on source-level compatibility.

Here is what's new:

  • Python bindings: Dr.Jit comes with an all-new set of Python bindings created using the nanobind library. This has several consequences:

    • Tracing Dr.Jit code written in Python is now significantly faster (we've observed speedups by a factor of ~10-20×). This should help in situations where performance is limited by tracing rather than kernel evaluation.

    • Thorough type annotations improve static type checking and code completion in editors like VS Code.

    • Dr.Jit can now target Python 3.12's stable ABI. This means that binary wheels will work on future versions of Python without recompilation.

  • Natural syntax: vectorized loops and conditionals can now be expressed using natural Python syntax. To see what this means, consider the following function that computes an integer power of a floating point array:

    from drjit.cuda import Int, Float
    
    @dr.syntax # <-- new!
    def ipow(x: Float, n: Int):
        result = Float(1)
    
        while n != 0:       # <-- vectorized loop ('n' is an array)
            if n & 1 != 0:  # <-- vectorized conditional
                result *= x
            x *= x
            n >>= 1
    
        return result

    Given that this function processes arrays, we expect that condition of the if statement may disagree among elements. Also, each element may need a different number of loop iterations. However, such component-wise conditionals and loops aren't supported by normal Python. Previously, Dr.Jit provided ways of expressing such code using masking and a special dr.cuda.Loop object, but this was rather tedious.

    The new :py:func:@drjit.syntax <drjit.syntax> decorator greatly simplifies the development of programs with complex control flow. It performs an automatic source code transformation that replaces conditionals and loops with array-compatible variants (:py:func:drjit.while_loop, :py:func:drjit.if_stmt). The transformation leaves everything else as-is, including line number information that is relevant for debugging.

  • Differentiable control flow: symbolic control flow constructs (loops) previously failed with an error message when they detected differentiable variables. In the new version of Dr.Jit, symbolic operations (loops, function calls, and conditionals) are now differentiable in both forward and reverse modes, with one exception: the reverse-mode derivative of loops is still incomplete and will be added in the next version of Dr.Jit.

  • Documentation: every Dr.Jit function now comes with extensive reference documentation that clearly specifies its behavior and accepted inputs. The behavior with respect to tensors and arbitrary object graphs (referred to as :ref:"PyTrees" <pytrees>) was made consistent.

  • Half-precision arithmetic: Dr.Jit now provides float16-valued arrays and tensors on both the LLVM and CUDA backends (e.g., drjit.cuda.ad.TensorXf16 or drjit.llvm.Float16).

  • Mixed-precision optimization: Dr.Jit now maintains one global AD graph for all variables, enabling differentiation of computation combining single-, double, and half precision variables. Previously, there was a separate graph per type, and gradients did not propagate through casts between them.

  • Multi-framework computations: The @drjit.wrap decorator provides a differentiable bridge to other AD frameworks. In this new release of Dr.Jit, its capabilities were significantly revamped. Besides PyTorch, it now also supports JAX, and it consistently handles both forward and backward derivatives. The new interface admits functions with arbitrary fixed/variable-length positional and keyword arguments containing arbitrary PyTrees of differentiable and non-differentiable arrays, tensors, etc.

  • Debug mode: A new debug validation mode (drjit.JitFlag.Debug) inserts a number of additional checks to identify sources of undefined behavior. Enable it to catch out-of-bounds reads, writes, and calls to undefined callables. Such operations will trigger a warning that includes the responsible source code location.

    The following built-in assertion checks are also active in debug mode. They support both regular and symbolic inputs in a consistent fashion.

    • drjit.assert_true
    • drjit.assert_false
    • drjit.assert_equal
  • Symbolic print statement: A new high-level symbolic print operation drjit.print enables deferred printing from any symbolic context (i.e., within symbolic loops, conditionals, and function calls). It is compatible with Jupyter notebooks and displays arbitrary PyTrees in a structured manner. This operation replaces the function drjit.print_async() provided in previous releases.

  • Swizzling: swizzle access and assignment operator are now provided. You can use them to arbitrarily reorder, grow, or shrink the input array.

    a = Array4f(...), b = Array2f(...)
    a.xyw = a.xzy + b.xyx
  • Scatter-reductions: the performance of atomic scatter-reductions (drjit.scatter_reduce, drjit.scatter_add) has been significantly improved. Both functions now provide a mode= parameter to select between different implementation strategies. The new strategy drjit.ReduceMode.Expand offers a speedup of over 10× on the LLVM backend compared to the previously used local reduction strategy. Furthermore, improved code generation for drjit.ReduceMode.Local brings a roughly 20-40% speedup on the CUDA backend. See the documentation section on atomic reductions for details and benchmarks with plots.

  • Packet memory operations: programs often gather or scatter several memory locations that are directly next to each other in memory. In principle, it should be possible to do such reads or writes more efficiently.

    Dr.Jit now features improved code generation to realize this optimization for calls to dr.gather() and dr.scatter() that access a power-of-two-sized chunk of contiguous array elements. On the CUDA backend, this operation leverages native package memory instruction, which can produce small speedups on the order of ~5-30%. On the LLVM backend, packet loads/stores now compile to aligned packet loads/stores with a transpose operation that brings data into the right shape. Speedups here are dramatic (up to >20× for scatters, 1.5 to 2× for gathers). See the drjit.JitFlag.PacketOps flag for details. On the LLVM backend, packet scatter-addition furthermore compose with the drjit.ReduceMode.Expand optimization explained in the last point, which combines the benefits of both steps. This is particularly useful when computing the reverse-mode derivative of packet reads.

  • Reductions: reduction operations previously existed as regular (e.g., drjit.all) and nested (e.g. drjit.all_nested) variants. Both are now subsumed by an optional axis argument similar to how this works in other array programming frameworks like NumPy. Reductions can now also process any number of axes on both regular Dr.Jit arrays and tensors.

    The reduction functions (drjit.all, drjit.any, drjit.sum, drjit.prod, drjit.min, drjit.max) have different default axis values depending on the input type. For tensors, axis=None by default and the reduction is performed along the entire underlying array recursively, analogous to the previous nested reduction. For all other types, the reduction is performed over the outermost axis (axis=0) by default.

    Aliases for the _nested function variants still exist to help porting but are deprecated and will be removed in a future release.

  • Prefix reductions: the functions drjit.cumsum, drjit.prefix_sum compute inclusive or exclusive prefix sums along arbitrary axes of a tensor or array. They wrap for the more general drjit.prefix_reduce that also supports other arithmetic operations (e.g. minimum/maximum/product/and/or reductions), reverse reductions, etc.

  • Block reductions: the new functions drjit.block_reduce and drjit.block_prefix_reduce compute reductions within contiguous blocks of an array.

  • Local memory: kernels can now allocate temporary thread-local memory and perform arbitrary indexed reads and writes. This is useful to implement a stack or other types of scratch space that might be needed by a calculation. See the separate documentation section about local memory for details.

  • DDA: a newly added digital differential analyzer (drjit.dda.dda) can be used to traverse the intersection of a ray segment and an n-dimensional grid. The function drjit.dda.integrate() builds on this functionality to compute analytic differentiable line integrals of bi- and trilinear interpolants.

  • Loop compression: the implementation of evaluated loops (previously referred to as wavefront mode) visits all entries of the loop state variables at every iteration, even when most of them have already finished executing the loop. Dr.Jit now provides an optional compress=True parameter in drjit.while_loop to prune away inactive entries and accelerate later loop iterations.

  • The new release has a strong focus on error resilience and leak avoidance. Exceptions raised in custom operations, function dispatch, symbolic loops, etc., should not cause failures or leaks. Both Dr.Jit and nanobind are very noisy if they detect that objects are still alive when the Python interpreter shuts down.

  • Terminology cleanup: Dr.Jit has two main...

Read more

Release (v0.4.6)

05 Jun 08:00
Compare
Choose a tag to compare

Most likely the last release which uses pybind11.

Changes in this patch release:

  • Set maximum version requirement for pybind11 dependency during wheel builds c69f3159

Release (v0.4.5)

04 Jun 14:05
Compare
Choose a tag to compare

Changes in this patch release:

  • Fix wavefront loops which would occasionally create new kernels 8f09760
  • Fix source of CUDA segfaults 9aa2d87
  • In C++ dr::binary_search could unexpectedly create new kernels b48701e
  • Minor changes to support Nvidia v555 drivers 216d921

Release (v0.4.4)

07 Dec 14:36
Compare
Choose a tag to compare

Changes in this patch release:

  • Added new dr.prefix_sum operation for inclusive and exclusive prefix sums 4be7aa0
  • Added new dr.scatter_inc operation for stream compaction 754a541
  • Fix dr.dispatch when a instance of the class has been deleted 1f908cc
  • Support for dr.PCG32 samplers in recorded loops' state 58c8485
  • Extend dr.binary_search to additionally support non-scalar and multi-dimensional indices 79de06a .. 5fc5750
  • Fix race condition in jit_sync_thread() 6690923
  • Switch jitc_vcall_prepare() allocation method to avoid deadlocks c13ef93
  • Various minor bug fixes

Release (v0.4.3)

29 Aug 06:39
Compare
Choose a tag to compare

Changes in this patch release:

  • Fix nested recorded virtual function calls 7e8c13c
  • Fix dr.gather/dr.scatter operations on quaternion types 9a7ac4e
  • Fix gradient propagations on "special" types (quaternions, matrices, complex numbers) fe624de
  • Add support for JIT types in dr.transform_decompose and dr.transform_compose 1244530
  • Various minor bug fixes

Release (v0.4.2)

25 Apr 14:25
Compare
Choose a tag to compare

Changes in this patch release:

  • Fix dlpack conversions 16b3882
  • Fix virtual function call recursions 49ea1e0
  • Various fixes to quaternions handling 8bbf831 .. 02e5b47
  • Various minor bugs, memory leaks and undefined behaviour fixes