Loadable Function: ocl_prog = ocl_program (src_str)
Loadable Function: ocl_prog = ocl_program (src_str, build_opts_str)

Construct and compile an OCL program from an OpenCL C source code string.

ocl_program ingests an OpenCL C source code string src_str and proceeds to compile this code using the OpenCL online compiler. If given, the build options specified in the string build_opts_str are applied during compilation. If a compilation error occurs, the function prints the compiler build log with its error messages and aborts. Otherwise, an OCL program ocl_prog is returned. For the OpenCL C language, consult the OpenCL specification. We recommend to use the language in Version 1.1.

ocl_program prepends one line to the provided source code, possibly enabling 64-bit floating point (double precision), depending on the ability of the current OpenCL context; the provided source code must allow addition of this line.

An OCL program can contain multiple sub-programs, so-called kernels, which are referenced either by their names (taken from the source code) or by their indices in a list of all kernels.

Access to the OCL program is provided by ways of indexing. Information on the OCL program can be read from the following fields:

.valid

An integer value, with non-zero meaning that the OCL program is valid (compiled successfully and the corresponding OpenCL context is still active).

.num_kernels

The number of kernels (sub-programs) in the program.

.kernel_names

A cell array of strings holding the names of all kernels.

Furthermore, the user is able to enqueue specific OpenCL commands controlling the command queue workflow by issuing statements with the following fields (see the OpenCL specification for details):

.clEnqueueBarrier
.clFlush
.clFinish

Executing a kernel is performed in OpenCL by setting the kernel’s arguments and enqueueing the kernel into the (asynchronous) command queue. Using an OCL program in octave, both steps are performed using a single indexing statement with parentheses:

[argout1, argout2, ...] = ocl_prog (kernel_index, work_size, cellout, argin1, argin2, ..., opt) 

The parameters have the following meaning:

kernel_index

Either the kernel index (0 <= kernel_index < num_kernels), or a kernel name string (which is slightly slower).

work_size

Either a single positive integer specifying the total number of work-items for parallel execution (SIMD principle, i.e., Single Instruction Multiple Data), or a matrix with at most three rows. The number of columns of the matrix is the number of dimensions for specifying work-items. The first row of the matrix specifies the number of work-items per dimension; their overall product corresponds to the single integer mentioned earlier. The second row of the matrix, if given, specifies an offest, per dimension, for work-item indices. The third row of the matrix, if given, specifies the number of work-items, per dimension, that make up a work-group. For details, consult the OpenCL specification.

cellout

A cell array describing the output arguments. Output arguments are OCL matrices of which the number, sizes (and types) must be pre-specified in order to be allocated automatically before the actual kernel call. To specify N output arguments, the size of the cell array must be either 1xN, Nx1, 2xN, or Nx2. The cell array must contain either only the matrices’ sizes (each as an octave row vector), in which case the default type ’double’ is assumed, or contain in a second row / column also the matrices’ data types (e.g., ’single’) as strings. For complex-valued output arguments, the type must indicate this explicitly (e.g. ’double_complex’). In the kernel’s OpenCL C declaration, these output arguments must be the first arguments, preceeding the input parameters. Complex-valued (output and input) arguments to OpenCL C kernels must be declared as global pointers to ’double2’ or ’float2’ (e.g., __global float2 *arg).

argin1, argin2, ...

A list of input arguments to the kernel. These can be: an OCL matrix, or a single octave scalar, or a (small) octave matrix. Note that in the first case, no type checking is possible, so it is the user’s responsibility to match the matrix data types in octave and in the kernel code. Note also that in the later cases, type matching is also essential; often, one will want to convert parameters explicitly before using as an argument (e.g., uint64(n) to convert an octave double scalar to a kernel source argument of type ulong). Note finally that passing an octave matrix has tight data size limitations, whereas passing an OCL matrix has not.

opt

(Optional) An option string specifying input OCL matrix handling. "make_unique" (the default) is the safest and easiest, but may, in some cases, involve deep data copying before the kernel call. It is recommended for kernel prototyping and simple calls (e.g., with OCL matrices created just before the call). "slice_ofs" is the elaborate and efficient alternative, which needs small modifications to the kernel declaration and code (for an example, see ocl_tests.m). This option is recommended for any new function accepting OCL matrices to be passed to kernels (e.g. library functions working on OCL data).

For convenience, a call with only the kernel name string specified does not execute a kernel but returns its kernel index (which might be stored in a persistent variable for all future kernel calls):

kernel_index = ocl_prog (kernel_name) 

ocl_program automatically assures that the OpenCL library is loaded (see ocl_lib) and that an OpenCL context is created with an OpenCL device (see ocl_context).

Be aware that running your own OpenCL C code comes with a certain risk. If your code contains an infinite loop, there is no way of stopping the code; similarly, in case of a memory access bug, the octave interpreter may crash or stall, needing to be stopped by means of the operating system, losing all data that was unique in octave’s workspace.

See also: oclArray, ocl_tests, ocl_context, ocl_lib, ocl_double, ocl_single, ocl_int8, ocl_int16, ocl_int32, ocl_int64, ocl_uint8, ocl_uint16, ocl_uint32, ocl_uint64.

Package: ocl