KOKKOS_LAMBDA(int i, int j, int k, float &lsum) { lsum += f[x][y][z]; },

Kokkos::Sum<float>(sum)

);

Tiling: easy, just add the third set of parameters: MDRangePolicy<Rank<3>>({nx_s, ny_s, nz_s}, {nx_e, ny_e, nz_e}, {ts_x, ts_y, ts_z}), the tile sizes on x/y/z loops are ts_{x/y/z}. For GPUs a tile is handled by a single thread block.

Default iteration patterns match the default memory layouts, can change the iteration patterns between tiles (IterateOuter) and within tiles (IterateInner): Kokkos:Rank<ndim, IterateOuter, IterateInner>.

WorkTag: enables multiple operators in one functor:

1

2

3

4

5

6

7

8

9

10

11

structfoo

{

structTag1{}; structTag2{};

KOKKOS_FUNCTION voidoperator(Tag1, int i)const{...}

KOKKOS_FUNCTION voidoperator(Tag2, int i)const{...}

voidrun_both(int n)

{

parallel_for(RangePolicy<Tag1>(0, N), *this);

parallel_for(RangePolicy<Tag2>(0, N), *this);

}

}

6. Subview

A subview is a slice of each dimension of a view and points to the same data, can be constructed on host or with in a kernel. Similar to the “colon” notation provided by MATLAB, Fortran, Python.

Subview can take three types of slice arguments:

Index: a scalar, only the given index in that dimension will remain

Kokkos::pair: a half-open range of indices

Kokkos::ALL: the entire range

For example, the following code is equivalent to MATLAB code norm(tensor(3, 5:10, :), 'fro'):