SKIP 5 — Initial Array API adoption#

Author:: Evgeni Burovski <evgeny.burovskiy@gmail.com>
Status:: Draft
Type:: Standards Track
Created:: 2026-05-18
Resolved:: <null>
Resolution:: <null>
Version effective:: 2.0

Abstract#

The Array API standard [3] aims to standardize a common subset of functionality of the majority of array libraries, such as NumPy, PyTorch, CuPy and JAX, with a view to remedy the fragmentation of the array computing ecosystem caused by the accumulated divergences among these—almost, but not completely dissimilar—array/tensor libraries.

Adopting the Array API standard in scikit-image gives users flexibility in choosing their software stack for array computing, and enables performance improvements from using low-level implementations of hardware accelerated algorithms [4].

Motivation and Scope#

Historically, NumPy set the stage for array computing in Python and serves as the base of the whole scientific computing ecosystem. With time, multiple alternative array/tensor computing libraries appeared and became popular: CuPy for “NumPy on GPUs”, PyTorch and JAX, to name just a few. All these low-level libraries are similar but have enough differences to make moving between libraries difficult. Each of these libraries serves as a base of its respective collection of domain-specific packages and software stacks. As a result, the ecosystem is fractured.

The Array API standard aims to unify the ecosystem by specifying the minimal common subset that all array/tensor computing libraries implement. Then, when domain-specific libraries support this minimal useful subset, end users automatically benefit from hardware-specific implementations “hidden” in the array libraries.

Ongoing work on adopting the Array API standard in SciPy [5], [6] and scikit-learn [7] has shown performance improvements of up to 50x or more from using CuPy or PyTorch GPU for the array compute layer (see [4] for an partial overview). Note that these performance gains for end users are essentially “free” (end users simply feed the right array types to scipy/scikit-learn functions), and the work for enabling them is borne by the library authors and maintainers. The rest of this document details changes needed in scikit-image for an initial Array API support.

Detailed description#

All of Array API processing is still experimental: users need to define the environment variable SCIPY_ARRAY_API=1 before importing SciPy and scikit-image. This follows SciPy and scikit-learn, and is necessary as long as SciPy requires the environment variable to enable the Array API dispatch (likely, until SciPy 2.0 is released).

If Array API dispatch is not activated via the environment variable, the behavior is exactly backwards compatible. When the Array API dispatch is active:

A high-level goal is that a scikit-image function is able to receive array arguments from a namespace X and return results with arrays of the same namespace X (with X being, for example, NumPy, CuPy or pytorch).
Where non-CPU devices are involved, no silent device transfers are permitted by default (exceptions to this rule should be rare and need an explicit decision and documentation).
Mixing arrays from different namespaces is not allowed: func(cupy_array, numpy_array) raises an error instead of guessing the user intent or silently device transferring one of the arrays.
For backwards compatibility, “array_like” arguments (e.g. lists) are treated as NumPy arrays.
Two new runtime dependencies are added: array-api-compat and array-api-extra. Both dependencies are pure python packages, available from PyPI and conda-forge. If adding runtime dependencies creates an undue burden for users, these two packages can be vendored instead. The need for these dependencies is as follows:
- Not all array libraries implement the Array API spec completely. array-api-compat is a lightweight compatibility layer which smoothens out the deviations from the spec for numpy, cupy and pytorch.
- array-api-extra provides a collection of non-standard functions which were found to be broadly useful from previous adoption work in SciPy, scikit-learn and other packages (e.g. atleast_nd as a replacement for atleast_{1d,2d}, testing helpers to replace those from np.testing, and so on).

The initial Array API adoption effort targets the following array libraries: NumPy, CuPy and Pytorch (in the eager mode). For testing, it is convenient to also rely on array-api-strict—a strict implementation of the Array API which is specifically made to implement the specification to the letter, as an implement for flagging deviations from the Array API standard. The package is available from PyPI and conda-forge, and will be added as a new test-only dependency.

Potential follow-ups#

Potential targets for follow-up efforts may include JAX, marray and JIT modes of JAX and pytorch. Why these are not included in the initial support:

JAX arrays are immutable, and scikit-image extensively relies on in-place modifications of numpy arrays. While there is a wealth of experience of adding JAX support in SciPy, and array-api-extra contains useful primitives for supporting both mutable and immutable array libraries, we feel that implementing the support is best left for a follow-up.
Supporting marray masked arrays would be a separate enhancement, if there’s a sufficient interest.
Supporting JIT modes of pytorch and jax. These can potentially provide non-trivial performance benefits; however adding the support increases the scope significantly, and is best left for a follow-up effort.

Implementation#

For pure python code, majority of implementation changes amount to cleaning up old-style numpy idioms. For instance, np.issubdtype(x.dtype.type, np.complexfloating) becomes xp.isdtype(x.dtype, "complex floating"). Here xp is the array namespace, which is typically computed in public functions by calling the array_namespace function from the array-api-compat package:

def foo(x: array_like, y: array_like, mode: bool):
    xp = array_namespace(x, y)
    ...

See [8], [9] for a discussion and worked examples, and/or [14] for a SciPy-specific discussion.

Using functions from upstream libraries which have already adopted the Array API standard does not need any changes. For example, scipy.ndimage.sobel supports NumPy arrays, CuPy arrays and pytorch CPU tensors, and adheres to the namespace out = namespace in design rule. This is, in fact, the behavior of the the majority of the scipy.ndimage API. [10]

Compiled code written in C or Cython becomes CPU only in the Array API jargon via the basic sandwich pattern:

def foo(x: array_like):
    xp = array_namespace(x)
    x_np = np.asarray(x)     # convert to numpy
    result_np = cython_implemented_function(x_np)   # compute
    return xp.asarray(result_np)   # convert back

Note that this pattern is zero-copy for arrays on CPU [11], and raises a clear error if converting to NumPy requires a device transfer. See [14] for a detailed discussion.

Backend-specific code paths. In rare cases, there is a need to use backend-specific code paths. A common occurrence is calling numpy-specific code as a performance optimization. For example, a linear solve is xp.linalg.solve(a, b); for symmetric matrices, SciPy allows an optimization, scipy.linalg.solve(a, b, assume_a="sym"), but 1) the additional assume_a argument is not available in Array API compatible libraries, and 2) the SciPy version is numpy-specific and does not handle non-numpy arrays. One can either use the “sandwich” and settle on CPU-only code, or branch explicitly with the the is_numpy function from array-api-compat:

xp = array_namespace(a, b)
# ...
if is_numpy(xp):
    # use the symmetric solve
    x = scipy.linalg.solve(a, b, assume_a="sym")
else:
    # fall back to a general solve
    x = xp.linalg.solve(a, b)

The use of this pattern (and its analogs, is_torch, is_cupy) should be limited to an absolute minimum.

Testing practice. Adapting tests to validate the behavior across backends is relatively straightforward if laborious. Following the established SciPy practice [14]:

Test functions acquire a new xp fixture; During the test run, it parameterizes the test over the installed backends:

def test_foo(xp):
    x_np = np.random.uniform(size=42)
    x = xp.asarray(x_np)
    result = foo(x)
    # assert properties of `result`

Use xp-aware assertions from array-api-extra instead of those from np.testing:
```
from array_api_extra import xp_assert_close  # instead of np.testing.assert_allclose
# ...
def test_foo(xp):
    x = xp.arange(12)
    xp_assert_close(x[-2:], xp.asarray([10, 11], atol=1e-15)
```
These assertions gracefully handle GPU arrays, and by default check for common errors (returning an array from a wrong namespace, wrong shapes/dtypes).

CI practice. The CPU CI can reuse the SciPy setup, scipy/scipy The CUDA/GPU CI can also follow SciPy, once a solution to scipy/scipy#24990 emerges.

Compiled code: beyond the sandwich pattern#

The pattern of convert to numpy, perform computations, convert back, recommended above for the compiled code, has an obvious drawback of not allowing non-CPU arrays. There are several ways of removing this limitation, all of which come with maintainability costs, and are thus excluded from the initial proposal of this SKIP. Briefly:

delegate to a compatible library with the relevant functionality, if it exists. Specific delegation / dispatch patterns of varying degrees of sophistication are under active discussion in the community;
maintain parallel “backends”: typically this means writing a “generic”, pure python analog of the compiled function and call it for non-numpy array inputs. Examples of this approach include scipy.spatial.Rotation and scipy.interpolate.RBFInterpolator. This has an obvious maintenance cost of having to maintain two parallel implementations, which has to be balanced with the performance benefits (if any). See [4] and [12] for demos of the latter.
For functions implemented in pythran, use the pythran’s “dual mode” [13], where the same source code is used for both ahead-of-time compiling for numpy arrays and a “generic” Array API backend for other array types. This functionality is very new in Pythran and needs extensive investigation still.

Backward compatibility#

There is no pressing need to break backwards compatibility for the Array API support itself. Alternative backends have some limitations relative to NumPy, thus numpy-specific parts will keep being numpy-specific (for instance, longdouble dtypes are non-existent in PyTorch).

Alternatives#

The proposed approach is heavily informed by the experience of adopting the Array API in SciPy and scikit-learn.

Discussion#

Much of scikit-image relies on computational kernels written in Cython. A frequent concern is whether adopting the Array API brings meaningful benefits—these handwritten kernels are and will remain being NumPy-only, while the majority of gains reported in [4] are from using hardware accelerators.

First of all, to an extent of scikit-image using scipy.ndimage, the latter does benefit from GPU execution for CuPy arrays, today. This way, scikit-image functions which call scipy.ndimage functions and have their internals Array API compatible, use CUDA automatically for CuPy array inputs. See [15] for a worked example.

More generally, Array API compatibility gives a generic framework for, and implements the foundational infrastructure of, dispatching to specialized accelerator-enabled implementations (such as CUCIM and similar GPU libraries). Specific details of the dispatching can take multiple forms, and are under discussion elsewhere. What the Array API compatibility provides however, is a general and ecosystem-aligned framework for working through these (both fascinating and difficult) details.

References and Footnotes#

All SKIPs should be declared as dedicated to the public domain with the CC0 license [1], as in Copyright, below, with attribution encouraged with CC0+BY [2].

Copyright#

This document is dedicated to the public domain with the Creative Commons CC0 license [1]. Attribution to this source is encouraged where appropriate, as per CC0+BY [2].