Construct an OCL matrix from an octave matrix, preserving the elements’ data type.
oclArray
takes as input a conventional numeric octave matrix
octave_mat (actually an N-dimensional array) of any numeric data type
(float, int32, etc.).
The function creates a new OCL matrix ocl_mat (as an N-dimensional
array) of data type corresponding to octave_mat, by calling
the corresponding OCL matrix constructor function (e.g., ocl_double
).
In effect, the function allocates storage space on the OpenCL
device hardware and copies the octave matrix data into the OpenCL
device memory. The data then remains in device memory until the OCL
matrix is cleared from the octave workspace (or as long as the
OpenCL context exists).
The reverse operation can be performed with the ocl_to_octave
function.
Example:
mat = magic (4); # transfer mat to OpenCL memory ocl_mat = oclArray (mat); # perform computations with ocl_mat on OpenCL device ocl_mat2 = ocl_mat + 1; ocl_mat3 = ocl_mat2 .^ 3; ocl_mat3(:,3) = 5; ocl_mat2 = mean (floor (ocl_mat3 / 2), 2); # transfer ocl_mat2 to octave memory mat2 = ocl_to_octave (ocl_mat2); disp (mat2)
For compatibility with MATLAB, the gpuArray
function is an alias
to oclArray
.
Two kinds of operation are possible with OCL matrices to perform numeric computations:
First, many (but not all) built-in operations known from octave matrices are possible
(e.g., multiplication by *
, indexing by ranges; standard functions like
reshape
, repmat
, ndgrid
; numeric functions like cos
,
sumsq
; searching functions like max
and OCL’s special findfirst
/
findlast
). All of these operations are performed via small OCL-internal
OpenCL C subprograms (kernels) which are restricted to the SIMD principle (Single
Instruction Multiple Data). Because of this, there are various restrictions on
built-in operations with OCL matrices (e.g., indexing by ranges must result in data
which is contiguous in OpenCL memory; no broadcasting). In particular, math functions
which are expected to give complex-valued results require complex input matrices.
See the ocl_tests.m file for details of the implemented functionality.
Using the built-in operations on OCL matrices will help in the transition from a CPU-based computation to an OpenCL-based computation, since little octave code needs changes (mostly the beginning and final parts). However, be aware that the internal effort of both octave and the OpenCL driver for handling the built-in operations may cause a significant overhead. Also be aware that all OCL matrix operations are computed asynchronously, and that any intermediate copying of data to or from the OpenCL device interrupts and potentially delays this asynchronous workflow.
Keep in mind that OCL data virtually lives in another world, in space and time; space, because it is generally stored in a memory which is physically separate from octave CPU memory; time, because the data resulting from an operation will generally exist not directly after the scheduling octave command returns, but only later, due to the asynchronous workflow.
The second kind of operation is to use an OCL matrix as an argument when
calling a user-written OpenCL C program (i.e., calling a kernel in an OCL
program, see ocl_program
). User-written OpenCL C programs make
the OCL functionality easily extendible.
oclArray
automatically assures that the OpenCL library is
loaded (see ocl_lib
) and that an OpenCL context is created with an
OpenCL device (see ocl_context
).
See also: ocl_to_octave, gpuArray, gather, ocl_tests, ocl_program, ocl_context, ocl_lib, ocl_constant, ocl_ones, ocl_zeros, ocl_eye, ocl_cat, ocl_linspace, ocl_logspace, ocl_double, ocl_single, ocl_int8, ocl_int16, ocl_int32, ocl_int64, ocl_uint8, ocl_uint16, ocl_uint32, ocl_uint64.
Package: ocl