Construct and compile an OCL program from an OpenCL C source code string.
ocl_program ingests an OpenCL C source code string src_str and 
proceeds to compile this code using the OpenCL online compiler.  
If given, the build options specified in the string build_opts_str are 
applied during compilation.  If a compilation error occurs, the function 
prints the compiler build log with its error messages and aborts.  Otherwise, an 
OCL program ocl_prog is returned.  
For the OpenCL C language, consult the OpenCL specification.  We recommend to 
use the language in Version 1.1.  
ocl_program prepends one line to the provided source code, possibly enabling 
64-bit floating point (double precision), depending on the ability of the current 
OpenCL context; the provided source code must allow addition of this line.  
An OCL program can contain multiple sub-programs, so-called kernels, which are referenced either by their names (taken from the source code) or by their indices in a list of all kernels.
Access to the OCL program is provided by ways of indexing. Information on the OCL program can be read from the following fields:
.validAn integer value, with non-zero meaning that the OCL program is valid (compiled successfully and the corresponding OpenCL context is still active).
.num_kernelsThe number of kernels (sub-programs) in the program.
.kernel_namesA cell array of strings holding the names of all kernels.
Furthermore, the user is able to enqueue specific OpenCL commands controlling the command queue workflow by issuing statements with the following fields (see the OpenCL specification for details):
.clEnqueueBarrier.clFlush.clFinishExecuting a kernel is performed in OpenCL by setting the kernel’s arguments and enqueueing the kernel into the (asynchronous) command queue. Using an OCL program in octave, both steps are performed using a single indexing statement with parentheses:
[argout1, argout2, ...] = ocl_prog (kernel_index, work_size, cellout, argin1, argin2, ..., opt)
The parameters have the following meaning:
Either the kernel index (0 <= kernel_index < num_kernels), or a kernel name string (which is slightly slower).
Either a single positive integer specifying the total number of work-items for parallel execution (SIMD principle, i.e., Single Instruction Multiple Data), or a matrix with at most three rows. The number of columns of the matrix is the number of dimensions for specifying work-items. The first row of the matrix specifies the number of work-items per dimension; their overall product corresponds to the single integer mentioned earlier. The second row of the matrix, if given, specifies an offest, per dimension, for work-item indices. The third row of the matrix, if given, specifies the number of work-items, per dimension, that make up a work-group. For details, consult the OpenCL specification.
A cell array describing the output arguments.  Output arguments are OCL matrices 
of which the number, sizes (and types) must be pre-specified in order to be allocated 
automatically before the actual kernel call.  To specify N output arguments, 
the size of the cell array must be either 1xN, Nx1, 2xN, or Nx2.  The cell 
array must contain either only the matrices’ sizes (each as an octave row vector), 
in which case the default type ’double’ is assumed, or contain in a second row / 
column also the matrices’ data types (e.g., ’single’) as strings.  For complex-valued 
output arguments, the type must indicate this explicitly (e.g. ’double_complex’).  
In the kernel’s 
OpenCL C declaration, these output arguments must be the first arguments, 
preceeding the input parameters.  
Complex-valued (output and input) arguments to OpenCL C kernels must be declared 
as global pointers to ’double2’ or ’float2’ (e.g., __global float2 *arg).  
A list of input arguments to the kernel. These can be: an OCL matrix, or a single 
octave scalar, or a (small) octave matrix.  Note that in the first case, 
no type checking is possible, so it is the user’s responsibility to match 
the matrix data types in octave and in the kernel code.  Note also that in the 
later cases, type matching is also essential; often, one will want to convert 
parameters explicitly before using as an argument (e.g., uint64(n) to 
convert an octave double scalar to a kernel source argument of type ulong).  
Note finally that passing an octave matrix has tight data size limitations, 
whereas passing an OCL matrix has not.  
(Optional) An option string specifying input OCL matrix handling. "make_unique" (the default) is the safest and easiest, but may, in some cases, involve deep data copying before the kernel call. It is recommended for kernel prototyping and simple calls (e.g., with OCL matrices created just before the call). "slice_ofs" is the elaborate and efficient alternative, which needs small modifications to the kernel declaration and code (for an example, see ocl_tests.m). This option is recommended for any new function accepting OCL matrices to be passed to kernels (e.g. library functions working on OCL data).
For convenience, a call with only the kernel name string specified does not execute a kernel but returns its kernel index (which might be stored in a persistent variable for all future kernel calls):
kernel_index = ocl_prog (kernel_name)
ocl_program automatically assures that the OpenCL library is 
loaded (see ocl_lib) and that an OpenCL context is created with an 
OpenCL device (see ocl_context).  
Be aware that running your own OpenCL C code comes with a certain risk. If your code contains an infinite loop, there is no way of stopping the code; similarly, in case of a memory access bug, the octave interpreter may crash or stall, needing to be stopped by means of the operating system, losing all data that was unique in octave’s workspace.
See also: oclArray, ocl_tests, ocl_context, ocl_lib, ocl_double, ocl_single, ocl_int8, ocl_int16, ocl_int32, ocl_int64, ocl_uint8, ocl_uint16, ocl_uint32, ocl_uint64.
Package: ocl