Function File: ocl_mat = oclArray (octave_mat)

Construct an OCL matrix from an octave matrix, preserving the elements’ data type.

oclArray takes as input a conventional numeric octave matrix octave_mat (actually an N-dimensional array) of any numeric data type (float, int32, etc.). The function creates a new OCL matrix ocl_mat (as an N-dimensional array) of data type corresponding to octave_mat, by calling the corresponding OCL matrix constructor function (e.g., ocl_double). In effect, the function allocates storage space on the OpenCL device hardware and copies the octave matrix data into the OpenCL device memory. The data then remains in device memory until the OCL matrix is cleared from the octave workspace (or as long as the OpenCL context exists).

The reverse operation can be performed with the ocl_to_octave function.

Example:

mat = magic (4);

# transfer mat to OpenCL memory
ocl_mat = oclArray (mat);

# perform computations with ocl_mat on OpenCL device
ocl_mat2 = ocl_mat + 1;
ocl_mat3 = ocl_mat2 .^ 3;
ocl_mat3(:,3) = 5;
ocl_mat2 = mean (floor (ocl_mat3 / 2), 2);

# transfer ocl_mat2 to octave memory
mat2 = ocl_to_octave (ocl_mat2);

disp (mat2)

For compatibility with MATLAB, the gpuArray function is an alias to oclArray.

Two kinds of operation are possible with OCL matrices to perform numeric computations:

First, many (but not all) built-in operations known from octave matrices are possible (e.g., multiplication by *, indexing by ranges; standard functions like reshape, repmat, ndgrid; numeric functions like cos, sumsq; searching functions like max and OCL’s special findfirst / findlast). All of these operations are performed via small OCL-internal OpenCL C subprograms (kernels) which are restricted to the SIMD principle (Single Instruction Multiple Data). Because of this, there are various restrictions on built-in operations with OCL matrices (e.g., indexing by ranges must result in data which is contiguous in OpenCL memory; no broadcasting). In particular, math functions which are expected to give complex-valued results require complex input matrices. See the ocl_tests.m file for details of the implemented functionality.

Using the built-in operations on OCL matrices will help in the transition from a CPU-based computation to an OpenCL-based computation, since little octave code needs changes (mostly the beginning and final parts). However, be aware that the internal effort of both octave and the OpenCL driver for handling the built-in operations may cause a significant overhead. Also be aware that all OCL matrix operations are computed asynchronously, and that any intermediate copying of data to or from the OpenCL device interrupts and potentially delays this asynchronous workflow.

Keep in mind that OCL data virtually lives in another world, in space and time; space, because it is generally stored in a memory which is physically separate from octave CPU memory; time, because the data resulting from an operation will generally exist not directly after the scheduling octave command returns, but only later, due to the asynchronous workflow.

The second kind of operation is to use an OCL matrix as an argument when calling a user-written OpenCL C program (i.e., calling a kernel in an OCL program, see ocl_program). User-written OpenCL C programs make the OCL functionality easily extendible.

oclArray automatically assures that the OpenCL library is loaded (see ocl_lib) and that an OpenCL context is created with an OpenCL device (see ocl_context).

See also: ocl_to_octave, gpuArray, gather, ocl_tests, ocl_program, ocl_context, ocl_lib, ocl_constant, ocl_ones, ocl_zeros, ocl_eye, ocl_cat, ocl_linspace, ocl_logspace, ocl_double, ocl_single, ocl_int8, ocl_int16, ocl_int32, ocl_int64, ocl_uint8, ocl_uint16, ocl_uint32, ocl_uint64.

Package: ocl