Implementing Improved Slicing in ndarray 0.11

ndarray 0.11 has a convenient new feature: combined slicing and subviews as a single operation. ndarray is a Rust crate that provides an n-dimensional array type and associated functionality for general elements and for numerics.

This article discusses the implementation of the new feature. It’s a story of type hacking: zero-sized types, associated types, traits for compile-time type manipulation, and pointer casting.

Introduction

The primary type in ndarray is the ArrayBase<S, D> container type, which is represented internally as:

the data representation (e.g. a contiguous memory block for owned arrays),
a pointer to the first element,
a sequence of n axis lengths, and
a sequence of n axis strides.

The two generic type parameters in ArrayBase<S, D> are:

S: the data ownership (owned arrays own the block of data; array views only have a pointer to it)
D: the dimensionality of the array (e.g. Ix1 for 1-D arrays, Ix2 for 2-D arrays, etc.)

ArrayBase is used for both owned arrays and views of arrays. Note that the number of dimensions of the array is typically fixed at compile time, although there is a dynamic dimension type IxDyn if needed.

Slicing

Slicing methods such as .slice() create a view a subset of the data in an array. With a step other than 1, it’s also possible to do things such as view every third index along an axis or reverse the order of an axis.

For example, let’s say you have a 3-dimensional array of shape 2 × 2 × 3 (i.e. 2 submatrices of 2 rows and 3 columns):

let arr = array![[[ 1,  2,  3],
                  [ 4,  5,  6]],
                 [[ 7,  8,  9],
                  [10, 11, 12]]];
assert_eq!(arr.shape(), &[2, 2, 3]);

You could create a slice with

Both of the submatrices, but in reverse order: 0..2;-1
The first row in each submatrix: ..1
The even-index columns in each submatrix: ..;2

let slice = arr.slice(s![0..2;-1, 0..1, ..;2]);
assert_eq!(
    slice,
    array![[[7, 9]],
           [[1, 3]]]
);
assert_eq!(slice.shape(), &[2, 1, 2]);

Here, slice is a view of a subset of arr (without copying).

Subviews

Subview methods such as .subview() create a view restricted to a single index along an axis, and then remove that axis from the view. This produces a view with one dimension less than the input array.

For example, let’s select the first (index 0) row in each submatrix (axis 1) of arr. The new shape is [2, 3] because axis 1 was removed from the view.

let subview = arr.subview(Axis(1), 0);
assert_eq!(
    subview,
    array![[1, 2, 3],
           [7, 8, 9]]
);
assert_eq!(subview.shape(), &[2, 3]);

Combining slicing and subviews

ndarray 0.11 makes it possible to combine slicing and subviews in a single operation. To get the same result in earlier versions of ndarray, you’d have to chain together multiple .slice() and .into_subview() calls while being careful about their order. For example,

let arr = Array4::<f64>::zeros((12, 10, 6, 9));

// old
let slice = arr.slice(s![3..7, .., ..;2, ..]).into_subview(Axis(3), 5).into_subview(Axis(1), 4);

// new
let slice = arr.slice(s![3..7, 4, ..;2, 5]);

In this example, 3..7 and ..;2 indicate slices of axes 0 and 2, and 4 and 5 indicate subviews of axes 1 and 3. The new version is more concise and is easier to use correctly than a series of .slice() and .into_subview() calls.

Discussion

Implementing this feature was interesting because so much needed to be handled at compile time. The goals were:

Allow a combination of slicing and subviews in a single operation.
Provide compile-time checking of the number of dimensions in the slicing argument. For example, it should be a compile-time error to call Array1::zeros(5).slice(s![.., ..]).
Determine at compile time the number of dimensions of the output array/view after slicing. (Otherwise, we’d always have to return a dynamic-dimensional array.)
Provide a representation of slicing information that can be passed around or stored in a variable.
Minimize the use of generic type parameters.

One option was to create a macro that expands to a series of .slice() and .into_subview() calls, like this:

slice!(arr, [3..7, 4, ..;2, 5])
=> arr.slice(s![3..7, .., ..;2, ..]).into_subview(Axis(3), 5).into_subview(Axis(1), 4)

This approach was backwards-compatible, but it didn’t address goal 4 (providing an object that could represent the slicing information). Additionally,

slice_move!(5 + &slice!(arr, [3..7, 4]) - 9, [..;2])

wasn’t quite as easy to read as

(5 + &arr.slice(s![3..7, 4]) - 9).slice_move(s![..;2])

So, I decided to focus on modifying the s![] macro to support subviews.

Representing slicing information

The first step was to define a representation of the information necessary to perform a slice.

In earlier versions of ndarray, this was &[Si; n], where each instance of Si

pub struct Si(pub Ixs, pub Option<Ixs>, pub Ixs);

represented a slice of one axis. The signature of .slice() was

impl<A, S, D> ArrayBase<S, D>
where
    S: Data<Elem=A>,
    D: Dimension,
{
    pub fn slice(&self, indexes: &D::SliceArg) -> ArrayView<A, D>;
}

where D::SliceArg was [Si; n] for fixed-dimension arrays and [Si] for dynamic-dimension arrays. The s![] macro simply provided a compact way to construct a &[Si; n] instance.

In ndarray 0.11, the replacement for Si had to be able to represent either a range with step (slice) or an index (subview). Rust made this easy with an enum:

pub enum SliceOrIndex {
    Slice {
        start: isize,
        end: Option<isize>,
        step: isize,
    },
    Index(isize),
}

Now that .slice() could take subviews in addition to slicing axes, the dimension of the output array/view could be different from the dimension of self. As a result, the function signature needed to have a generic parameter Do:

pub fn slice<Do>(&self, info: &SOMETHING_HERE<Do>) -> ArrayView<A, Do>
where
    Do: Dimension;

So, s![] couldn’t just return &[SliceOrIndex; n]; it needed to include Do. The slicing information had two parts: the output dimension and the actual sequence of SliceOrIndex. I combined these in a struct

pub struct SliceInfo<T: ?Sized, D: Dimension> {
    out_dim: PhantomData<D>,
    indices: T,
}

where T was [SliceOrIndex; n] when the SliceInfo was produced by s![].

Constructing `SliceInfo`

In order to construct a SliceInfo instance, the s![] macro had to build an array [SliceOrIndex; n] and determine the output dimension type after slicing. Both problems were solved with traits:

Use Into<SliceOrIndex> to convert each argument to SliceOrIndex.
Use a new trait SliceNextDim to determine the output dimension type.

The SliceNextDim trait provided a way to update the output dimension while iterating over the arguments.

pub trait SliceNextDim<D1, D2> {
    fn next_dim(&self, PhantomData<D1>) -> PhantomData<D2>;
}

Starting with $dim = PhantomData<Ix0> (0-dimensional) and calling SliceNextDim(&$arg, $dim) on each argument $arg to update $dim determined the output dimension. The implementation of SliceNextDim for index (subview) arguments specified that D2 = D1 since the corresponding axes were not included in the output, while the implementation for range (slice) arguments specified that D2 = D1::Larger since the corresponding axes were included in the output.

This was great because all the types are determined automatically at compile time! The return type from s![] was &SliceInfo<[SliceOrIndex; n], D>.

Argument coercions

The signature for .slice() was now

pub fn slice<Do>(&self, info: &SliceInfo<D::SliceArg, Do>) -> ArrayView<A, Do>
where
    Do: Dimension;

where D::SliceArg was [SliceOrIndex; n] for fixed dimensions and [SliceOrIndex] for the dynamic dimension (IxDyn). Automatic coercion of &SliceInfo<[SliceOrIndex; n], Do> to &SliceInfo<[SliceOrIndex], Do> made the output of s![] work for both fixed and dynamic dimensions.

The .slice_inplace() method didn’t change the dimensionality of the array/view, so it didn’t need the Do type argument. It’s signature remained

pub fn slice_inplace(&mut self, indices: &D::SliceArg);

This worked nicely when self had a fixed dimension because SliceInfo<T, D> implemented Deref { type Target = T; ... } so that &SliceInfo<[SliceOrIndex; n], Do> coerced to &[SliceOrIndex; n]. Unfortunately, the compiler was not able to combine the deref coercion and an automatic coercion to coerce &SliceInfo<[SliceOrIndex; n], Do> to &[SliceOrIndex]. However, the caller could manually call .as_ref() to perform the conversion.

It would be possible to work around coercion limitations by using a generic type argument I that implemented Into<&D::SliceArg>, but a design goal was to minimize the use of generics.

Other type conversions

I also wanted to support SliceInfo<Vec<SliceOrIndex>, D> because older versions of ndarray cleanly handled Vec<Si>. To make this convenient, an implementation of AsRef was provided to convert &SliceInfo<Vec<SliceOrIndex>, D> to &SliceInfo<[SliceOrIndex], D>. Specifically,

impl<T, D> AsRef<SliceInfo<[SliceOrIndex], D>> for SliceInfo<T, D>
where
    T: AsRef<[SliceOrIndex]>,
    D: Dimension,
{
    fn as_ref(&self) -> &SliceInfo<[SliceOrIndex], D>;
}

At first glance, this appeared to be impossible to implement. It seemed like the body of the method needed to create a new SliceInfo instance. However, if the function returned a reference to the new SliceInfo, the reference would be dangling because the SliceInfo would be dropped at the end of the function body.

bluss made the astute observation that &SliceInfo<T, D> had the same bitwise representation as &T because the out_dim: PhantomData<D> field in SliceInfo was a zero-sized type, and the only remaining field was indices: T. This meant that the conversion could be implemented with pointer casts

unsafe {
    &*(self.indices.as_ref() as *const [SliceOrIndex]
        as *const SliceInfo<[SliceOrIndex], D>)
}

We also added a #[repr(C)] attribute to SliceInfo for extra safety.

Evaluation of macro arguments

My first implementation of the s![] macro had a bug. Try to spot the bug in this piece of the implementation. The $r:expr;$s:expr pattern correctly matched a range;step argument.

macro_rules! s(
    // ...
    (@parse $dim:expr, [$($stack:tt)*] $r:expr;$s:expr, $($t:tt)*) => {
        s![@parse
            $crate::SliceNextDim::next_dim(&$r, $dim),
            [$($stack)* s!(@convert $r, $s),]
            $($t)*
        ]
    };
    // ...
);

The issue was that the $r expression was expanded in two places, so it was evaluated twice at runtime. Unfortunately, this issue wasn’t a problem in the common case (literal range and index arguments) so it took me a while to notice it.

The solution was to use match to evaluate $r once and create a binding r that could be used in both places:

macro_rules! s(
    // ...
    (@parse $dim:expr, [$($stack:tt)*] $r:expr;$s:expr, $($t:tt)*) => {
        match $r {
            r => {
                s![@parse
                   $crate::SliceNextDim::next_dim(&r, $dim),
                   [$($stack)* s!(@convert r, $s),]
                   $($t)*
                ]
            }
        }
    };
    // ...
);

This illustrates an important difference between function arguments and macro expression arguments. If you need to use the value of an expression in multiple places in a macro, you have to be careful not to accidentally evaluate the expression multiple times.

When using a macro that acts like a function (e.g. s![] or vec![]), it’s easy to ignore that the macro call is really an expansion into an expression, not a function call. This is usually okay when calling macros, but it’s important to keep the difference in mind when writing macros.

Note that the solution uses match $r { r => expr } instead of { let r = $r; expr } because match preserves temporary objects created in $r for the entire body. Thanks to bluss for pointing this out. For example,

// error
{
    let r = vec![1, 2, 3].iter();
    r.count()
};

// ok
match vec![1, 2, 3].iter() {
    r => {
        r.count()
    }
};

Conclusion

Implementing the combined slicing and subviews feature in ndarray 0.11 was a lot of fun and a great learning experience. There were a few important lessons:

Macros and traits are a surprisingly powerful combination for doing complex things at compile time. For example, the s![] macro is able to determine at compile time the output dimension of the sliced array/view from the number and types of its arguments.
Keep in mind low-level solutions to problems. Rust is a safe, high-level language, so it’s easy to forget what’s actually happening under-the-hood. In some cases, low-level solutions such as pointer casts are a simple way to work around high-level limitations.
Be careful not to treat macro arguments the same as function arguments, even if the macro looks a lot like a function. If you’re using an expression in two places and don’t want it to be evaluated twice, use match to evaluate it and create a binding, and then use that binding.