clRNG
An OpenCL library for random number generators (RNG)
 All Data Structures Files Functions Typedefs Enumerations Pages
Introduction

We introduce clRNG, an OpenCL library for generating uniform random numbers. It provides multiple streams that are created on the host computer and used to generate random numbers either on the host or on computing devices by work items. Such multiple streams are essential for parallel simulation [6] and are often useful as well for simulation on a single processing element (or within a single work item), for example when comparing similar systems via simulation with common random numbers [1], [9], [10], [5] . Streams can also be divided into segments of equal length called substreams, as in [2], [5], [10] . Currently, the library implements the following three generators: MRG31k3p [4], MRG32k3a [7], LFSR113 [8] and Philox-4×32-10 [11] .

Generators and prefixes

The API is, in large part, the same for every generator, with only the prefix of the type and function names that changes across generators. For example, to use the MRG31k3p generator, one needs to include the corresponding header file (which is normally the lowercase name of the generator with a .h extension on the host, or a .clh extension on the device) and use type and function names that start with clrngMrg31k3p:

#include <mrg31k3p.h>
cl_double foo(clrngMrg31k3pStream* stream) {
return clrngMrg31k3pRandomU01(stream);
}

(The above function just returns a number uniformly distributed in \((0,1)\) generated using the stream passed as its argument.) To use the LFSR113 generator instead of MRG31k3p, one must change the include directive and use type and function names that start with clrngLfsr113:

#include <lfsr113.h>
cl_double foo(clrngLfsr113Stream* stream) {
return clrngLfsr113RandomU01(stream);
}

In the generator API reference (given in clRNG_template.h), the generator-specific part of the prefix is not shown. The clRNG.h header file declares common function across different generators and also utility library functions.

Future versions of the library will support more generic interfaces that work across generators.

Environment variables

For all features of the library to work properly, the CLRNG_ROOT environment variable must be set to point to the installation path of the clRNG package, that is, the directory under which lies the cl/include subdirectory. Means of setting an environment variable depend on the operating system used.

Small examples

In the examples given below, we use the MRG31k3p from [4] . In general, a stream object contains three states: the initial state of the stream (or seed), the initial state of the current substream (by default it is equal to the seed), and the current state. With MRG31k3p, each state is comprised of six 31-bit integers. Each time a random number is generated, the current state advances by one position. There are also functions to reset the state to the initial one, or to the beginning of the current substream, or to the start of the next substream. Streams can be created and manipulated in arrays of arbitrary sizes. For a single stream, one uses an array of size 1. One can separately declare and allocate memory for an array of streams, create (initialize) the streams, clone them, copy them to preallocated space, etc.

Using streams on the host

We start with a small example in which we just create a few streams, then use them to generate numbers on the host computer and compute some quantity. This could be done as well by using only a single stream, but we use more just for the purpose of illustration.

The code includes the header for the MRG31k3p RNG.

#include <mrg31k3p.h>

We create an array of two streams named streams and a single stream named single.

clrngMrg31k3pStream* streams = clrngMrg31k3pCreateStreams(NULL, 2, NULL, NULL);
clrngMrg31k3pStream* single = clrngMrg31k3pCreateStreams(NULL, 1, NULL, NULL);

Then we repeat the following 100 times: we generate a uniform random number in \((0,1)\) and an integer in \(\{1,\dots,6\}\), and compute the indicator that the product is less than 2.

int count = 0;
for (int i = 0; i < 100; i++) {
double u = clrngMrg31k3pRandomU01(&streams[i % 2]);
int x = clrngMrg31k3pRandomInteger(single, 1, 6);
if (x * u < 2) count++;
}

The uniform random numbers over \((0,1)\) are generated by alternating the two streams from the array. We then print the average of those indicators.

printf("Average of indicators = %f\n", (double)count / 100.0);

Using streams in work items

In our second example, we create an array of streams and use them in work items that execute in parallel on a GPU device, one distinct stream per work item. Note that it is also possible (and sometimes useful) to use more than one stream per work item. We show only fragments of the code, to illustrate what we do with the streams. This code is only for illustration; the program does no useful computation.

In the host code, we first include the clRNG header for the MRG31k3p RNG:

#include <mrg31k3p.h>

Now suppose we have an integer variable numWorkItems that indicates the number of work items we want to use. We create an array of numWorkItems streams (and allocate memory for both the array and the stream objects). The creator returns in the variable streamBufferSize the size of the buffer that this array occupies (it depends on how much space is required to store the stream states), and an error code.

size_t streamBufferSize;
clrngMrg31k3pStream* streams = clrngMrg31k3pCreateStreams(NULL, numWorkItems,
&streamBufferSize, (clrngStatus *)&err);
check_error(err, "cannot create random stream array");

Then we create an OpenCL buffer of size streamBufferSize and fill it with a copy of the array of streams, to pass to the device. We also create and pass a buffer that will be used by the device to return an array of numWorkItems values of type cl_double. (OpenCL buffer creation is not specific to clRNG, so it is not discussed here).

// Create buffer to transfer streams to the device.
cl_mem buf_in = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
streamBufferSize, streams, &err);
// Create buffer to transfer output back from the device.
cl_mem buf_out = clCreateBuffer(context, CL_MEM_WRITE_ONLY | CL_MEM_HOST_READ_ONLY,
numWorkItems * sizeof(cl_float), NULL, &err);

We finally enqueue the kernel with these two buffers as kernel arguments (not shown here).

In the device code, we include the device-side clRNG header for the chosen RNG (it ends with .clh instead of .h).

#include <mrg31k3p.clh>

Pointers to the global memory buffers received from the host and to the output array are passed to the kernel as arguments. (The correspondence between the kernel arguments and the buffers on the host is specified in the host code, not shown here). For each work item, we create a private copy of its stream, named private_stream, in its private memory, so we can generate random numbers on the device. The private memory must be allocated at compile time; this is why private_stream is not declared as a pointer, so the declaration allocates memory. The kernel just generates two random numbers and returns the sum, in a cl_double.

__kernel void example(__global clrngMrg31k3pHostStream* streams, __global float* out) {
int gid = get_global_id(0);
clrngMrg31k3pStream private_stream_d; // This is not a pointer!
clrngMrg31k3pCopyOverStreamsFromGlobal(1, &private_stream_d, &streams[gid]);
out[gid] = clrngMrg31k3pRandomU01(&private_stream_d) +
clrngMrg31k3pRandomU01(&private_stream_d);
}

The host can then recover the array of size numWorkItems that contains these sums.

RNG-Specific API's

clRNG_template.h describes the random streams API as it is intended to be implemented using different types of RNG's or even using quasi-Monte Carlo (QMC) point sets.

In the description of this API, every data type and function name is assigned the prefix clrng. It is understood that, in the implementation for each RNG type, the prefix clrng is to be expanded with another prefix that indicates the type of RNG (or other method) used.

As this API is not polymorphic, replacing an RNG type with another one in client code requires changing the code to match clRNG function names and data types to match those of the replacement RNG. We also intend to propose a generic (in the polymorphic sense) interface to the clRNG library.

Stream Objects and Stream States

The library defines, among others, two closely related types of structures: stream objects (clrngStream) and stream states (clrngStreamState). The definitions of both structures depend on the specific type of RNG that they pertain to. Stream states correspond to the seeds of conventional RNG's, to counter values in counter-based RNG's, or to point and coordinate indices in QMC methods. Normally, the client should not deal with stream states directly, but use instead the higher-level stream objects. Stream objects are intended to store several stream states: the current and initial stream states, but also current substream state when support for substreams is available. Stream objects may also store other properties of the RNG, such as encryption keys for cryptography-based RNG's.

Arrays of Stream Objects

Many functions are defined only for arrays of stream objects and not for single stream objects. It is always possible to use these functions for single stream objects by specifying a unit array size.

Also, in order to comply with the OpenCL API and be consistent with the clBLAS API, functions that take an array as an argument have the array size argument come before the array argument.

Storing Stream States on the Device

When a kernel is called, the stream states it needs are normally passed by the host and stored in global memory. For efficiency reasons, it is desirable that the current stream state be first copied in a work-item's private memory, and to work with that copy inside the kernel. In the current implementation of clRNG, to avoid wasting device resources, only the current stream state is stored in the work-item's private memory. The initial stream state is left in global memory and a pointer to it is stored in the work-item's private memory. When substream support is turned on (by defining CLRNG_ENABLE_SUBSTREAMS before including the device-side header file; see clRNG_template.h), the initial state of the substream is stored in the work-item's private memory together with the current state.

Device-Side Code

To use the clRNG library on a device from within a user-defined kernel, the user must include the clRNG header file corresponding to the desired RNG, using an include preprocessor directive.

If default settings are not suitable for the user's needs, optional library behavior can be selected by defining specific preprocessor macros before including the clRNG header. For example, to enable substreams support on the device while using the MRG31k3p generator, use:

#define CLRNG_ENABLE_SUBSTREAMS
#include <mrg31k3p.clh>

A comprehensive list of supported device-side library options are described in clRNG_template.h.