SpatialOps
Nebo

Note: Work in progress. Highlighted items need links or other fixing.


Memory Locations of Fields

Using Fields on Multiple Devices

Introduction: Fields as an Abstraction over Device Memory

Since a mesh is simply stored as an array of data, we can abstract which device(s) that data resides in by using a single “field.” For example, a field representing temperature may be stored on the CPU, a GPU, or both. Alternatively, on a system with multiple GPUs, the temperature field could be on any subset of the available GPUs.

Nebo reads or modifies data on a single device, marked as active, while other devices are designated as either valid or invalid according to whether their data is current with that of the active device.

Active Device

Each field has exactly one device marked as active. This is the device upon which the data can currently be read and/or modified–unless the field is constant, in which case the data can only be read. If any modifications are made on the active device, all other devices containing the data will be flagged as invalid.

Valid and Invalid Flags

Because data can reside on multiple devices, we may need to perform transfers between those devices. To keep track of the integrity of that data, memory on each device can be flagged as valid, or invalid.

Invalid Flags

Memory on a device flagged as invalid contains outdated data; therefore, it cannot be read or modified. To make an invalid device valid again, the data must be copied from the active device to the invalid device. The functions in [Figure Number (below)] which can perform this copy will update the flag.

Valid Flags

Memory on a device flagged as valid can be read but not modified (unless it is designated as the active device). Any device marked as valid can be designated as the active device.

Device Indices

When passing a device into a function, the CPU and first GPU may be represented by either a name or an index number. For example, f.add_device(GPU_INDEX) and f.add_device(0) are equivalent. Additional GPUs must be represented as an index number of N, assuming a base of 0. This matches NVidia’s device indices for the GPUs.

Device Name Index
CPU CPU_INDEX -1
GPU 0 GPU_INDEX 0
GPU 1 1
GPU 2 2
... ...
GPU n n

Initial Device for a Field

Whenever a field is created, either by a constructor or a call to the SpatialFieldStore, the new field has an initial device. The memory for the default device is flagged as active. Thus, it can be read or modified at will; unless, of course, the field is constant, in which case the memory can only be read.

Initializing a Field with the Default Device

A field can be created, either by a constructor or a call to the spatial field store, without explicitly passing in a device. In these instances, the device will generally default to the CPU.

Example: Field Initialization with Default Device Using an SVolField

These values are assumed for all of the examples following them.

MemoryWindow mw = …;
BoundaryCellInfo bc = …;
GhostData ghosts = …;

Using the constructor (This will default to the CPU)

SVolField g(mw, bc, ghosts);

Using the constructor with externally allocated storage (This assumes the given pointer (ptrToMemory) is on the CPU)

SVolField g(mw, bc, ghosts, ptrToMemory, ExternalStorage);

Using the spatial field store (This will default to the CPU)

SpatFldPtr<SVolField> g = SpatialFieldStore::get_from_window(mw, bc, ghosts);

Initializing a field with the spatial field store while using a prototype will default to the device currently flagged as active:

Using a field with the SpatialFieldStore (This will default to the active device of f)

SpatFldPtr<SVolField> g = SpatialFieldStore::get<SVolField, SVolField>(f);

 

Initializing a Field with an Explicit Device Parameter

To initialize a field on a specific device, some additional arguments must be passed in to the constructor.

Example: Initializing a Field with an Explicit Device

These values are assumed for all of the examples following them.

MemoryWindow mw = …;
BoundaryCellInfo bc = …;
GhostData ghosts = …;

By constructor (Explicitly CPU)

SVolField g(mw, bc, ghosts, NULL, InternalStorage, CPU_INDEX);

By constructor (Explicitly GPU 0)

SVolField g(mw, bc, ghosts, NULL, InternalStorage, GPU_INDEX);

By constructor with externally allocated memory (Explicitly CPU)

SVolField g(mw, bc, ghosts, ptrToMemory, ExternalStorage, CPU_INDEX);

By constructor with externally allocated memory (Explicitly GPU)

SVolField g(mw, bc, ghosts, ptrToMemory, ExternalStorage, GPU_INDEX);

By SpatialFieldStore (Explicitly CPU)

SpatFldPtr<SVolField> g = SpatialFieldStore::get_from_window(mw, bc, ghosts, CPU_INDEX);

By SpatialFieldStore (Explicitly GPU 0)

SpatFldPtr<SVolField> g = SpatialFieldStore::get_from_window(mw, bc, ghosts, GPU_INDEX);

By SpatialFieldStore with prototype (Explicitly CPU, regardless of f’s active device)

SpatFldPtr<SVolField> g = SpatialFieldStore::get<SVolField, SVolField>(f, CPU_INDEX);

By SpatialFieldStore with prototype (Explicitly GPU 0, regardless of f’s active device)

SpatFldPtr<SVolField> g = SpatialFieldStore::get<SVolField, SVolField>(f, GPU_INDEX);

Adding More Devices to a Field

To add an additional device to a field, use the SpatialOps::SpatialField::add_device() member function on the field and pass in the index of the device to be added:

field.add_device( deviceIndex );

For example:

Field f; // f is avaliable on CPU
f.add_device(GPU_INDEX); // f is also available on first (or only) GPU

After calling add_device, the memory on the added device is available, up-to-date with the memory on the active device, and flagged as valid.

If the field already has memory on the specified device, then add_device will not allocate any additional memory. It will, however, validate the current memory if it is not already valid.

Adding Devices Asynchronously

The add_device member function is synchronous; that is, it will finish validating the device before returning. Because this can be a long operation and the device may not be needed immediately, we also provide an asynchronous method to add other devices, use the SpatialOps::SpatialField::add_device_async() method:

field.add_device_async( deviceIndex );

Validating a Device

To validate a device on a field, call SpatialOps::SpatialField::validate_device() on the field and pass in the device index of the device to be validated.

field.validate_device( deviceIndex );

Validating a device involves a data transfer between the field's active device and another added device. Only one device can be validated at a time. Because the data transfer can be a very costly operation, it is recommended to validate devices early and use the asynchronous version whenever possible.

Validating Devices Asynchronously

Similar to the add_device function, validate_device is synchronous. For asynchronous execution, we provide an asynchronous version of validate_device: SpatialOps::SpatialField::validate_device_async()

field.validate_device_async( deviceIndex );

Changing the Active Device

To set a device on a field as active, call the SpatialOps::SpatialField::set_device_as_active() member function and pass in the device index of the device to be made active.

field.set_device_as_active( deviceIndex );

There is a restriction that for a device to be made active, the device memory must already be valid. If the device memory is not valid, set_device_as_active will call add_device. Thus if the device is not already valid (or even allocated), set_device_as_active will be an expensive call.

Changing the Active Device Asynchronously

Similar to the add_device function, set_device_as_active is synchronous. For asynchronous execution, we provide SpatialOps::SpatialField::set_device_as_active(), an asynchronous version of set_device_as_active:

field.set_device_as_active_async( deviceIndex );

Single Value Fields

Single value fields are essentially scalars that behave as fields. This means that they can be allocated on different devices, and they can use the same model as described above to be used on multiple devices.

Example: Initializing a SingleValueField with an Explicit Device

These values are assumed for all of the examples following them.

MemoryWindow mw(IntVec(1,1,1));
BoundaryCellInfo bc(false, false, false);
GhostData ghosts(0);

By constructor (Explicitly CPU)

SingleValueField g(mw, bc, ghosts, NULL, InternalStorage, CPU_INDEX);

By constructor (Explicitly GPU 0)

SingleValueField g(mw, bc, ghosts, NULL, InternalStorage, GPU_INDEX);

 


Creating a New Stencil Operator

This section uses the new convention currently in this branch: new-stencil-convention

A stencil operator applies a specific field operation while converting a field from one type into another. Some commonly used stencil operations in Nebo include gradient, divergence, and interpolation. A complete list of stencils defined in SpatialOps can be found in spatialops/structured/stencil/StencilBuilder.cpp. Other projects that use Nebo/SpatialOps often define more; see each project’s documentation for more details.

Summary

The following code provides a quick sequential example of how to create and use a stencil to calculate a gradient in Nebo. Each step is explained in detail in the following sections.

Create a stencil point collection (link to section):
typedef IndexTriplet<-1,0,0> MinusPoint;
typedef IndexTriplet<1,0,0> PlusPoint;
typedef NEBO_FIRST_POINT(PlusPoint)::NEBO_ADD_POINT(MinusPoint)
FullCollection;
Create a coefficient collection (link to section):
NeboStencilCoefCollection<2> coefs = build_two_point_coef_collection( -1.0/dx, 1.0/dx);
Define the type of the stencil (link to section):
typedef NeboStencilBuilder<Gradient,
FullCollection,
GradX;
Create an instance of the stencil (link to section):
GradX gradX(coefs);
Use the stencil in Nebo assignment (link to section):
SVolField f;
//…
g <<= gradX(f);

Requirements

This section assumes an understanding of how to use Pre-defined Nebo Stencils [Link here to Nebo Expressions: Pre-defined Stencils].

To create a new stencil, the following items must be determined:

  • The types of the source and destination fields ([Link here to Fields and Meshes section])
  • The stencil’s shape (via stencil points)
  • Coefficients (if any) for each stencil point, and how to compute them at runtime

As an example, we will build a gradient operator, which calculates the difference between the cells on either side of the destination point, weighted by the distance between them. The source field of this operator will be SVolField (scalar volume field) and the destination field will be SSurfXField (X-face surfaces between scalar volume cells). The shape of this operator is a 3-wide one-dimensional stencil: it will read the neighbors in the X direction to the “left” and “right” of the destination point. Finally, it will calculate the difference between the values of its neighbors, so the coefficients will be the inverse of the change in the X direction (dx).

Defining Stencil Points

Each stencil point needs to be defined in terms of distance from the destination point. Thus, if the stencil point and the destination point share the same index, the stencil point will be 0,0,0. If the stencil point is one cell to the left of the destination point (in the X direction), the stencil point will be -1,0,0. If the stencil point is one cell above the destination point (in the Y direction), the stencil point will be 0,1,0. If the stencil point is two cells to the left of the destination point (in the X direction), the stencil point will be -2,0,0. There is no limit to the distance that can be between the stencil point and the destination point.

When creating a stencil from one field type to a different field type with a different offset [link to documentation on field types and their offsets], the origin stencil point (0,0,0) cannot be used. The origin stencil point can only be used when creating stencils of the same field type or of types with the same offset.

To construct a stencil point in code, one uses an index triplet. An index triplet is simply a type with three integer template arguments, one for each axis. The offsets are exactly what was determined for the index references as in the paragraph above.

For our gradient operator, we have two stencil points, which are one cell away to either side of the destination point, along the X direction. These points are -1,0,0 and 1,0,0. The index triplets are:

typedef IndexTriplet<-1,0,0> MinusPoint;
typedef IndexTriplet<1,0,0> PlusPoint;

Defining Point Collections

Once the stencil points are defined, they need to be gathered into a single type/collection. A Nebo stencil point collection is a list of stencil points (a list of index triplets). Here, the order of the stencil points does matter, for it will change the rounding error of the final result.

To create the point collection for our gradient stencil, we need to combine MinusPoint and PlusPoint into a single collection. For this example, we will create a collection of a single point (PlusPoint) first, and then add MinusPoint; and for the sake of clarity, we will do this creation and addition in separate type defintions:

typedef NEBO_FIRST_POINT(PlusPoint) FirstCollection;
typedef FirstCollection::NEBO_ADD_POINT(MinusPoint) FullCollection;

Alternatively, we can define these in a single line:

typedef NEBO_FIRST_POINT(PlusPoint)::NEBO_ADD_POINT(MinusPoint) FullCollection;

NEBO_FIRST_POINT and NEBO_ADD_POINT are C++ preprocessor macros, which construct a list of points. The definitions of these macros shows how to construct this list by hand. The details of manually constructing stencil point collections is left out of this section to avoid confusion.

Defining Coefficient Collections

For stencils that need coefficients at each point, the coefficients must be packaged in a Nebo Stencil Coefficient Collection. The order of coefficients in the Nebo Stencil Coefficient Collection must match the order of stencil offset points in the Nebo stencil point collection exactly.

Back to the example, our gradient stencil can be calculated with coefficients. (The gradient stencil can be calculated without stencils as explained below.) Since we want to average our two stencil points together, the coefficients will both be .5. For convenience, we will use the build_two_point_coef_collection function to build the coefficient collection:

NeboStencilCoefCollection<2> coefs = build_two_point_coef_collection( -1.0/dx, 1.0/dx);

The scalar/double value “dx” is assumed to be defined/known at this point. Because we want the difference between the values of our two stencil points, one must be negated; by convention, we negate the first. If both coefficients were positive, we would calculate the sum of the stencil points rather than the difference.

In addition to build_two_point_coef_collection(), there are four other related functions that build coefficient collections ranging in size from one coefficient to five coefficients:

Collection of size 1

NeboStencilCoefCollection<1> coef1 = build_coef_collection(0.5);

Collection of size 2

NeboStencilCoefCollection<2> coefs2 = build_two_point_coef_collection(0.5, 0.5);

Collection of size 3

NeboStencilCoefCollection<3> coefs3 = build_three_point_coef_collection(0.5, 0.5, 0.5);

Collection of size 4

NeboStencilCoefCollection<4> coefs4 = build_four_point_coef_collection(0.5, 0.5, 0.5, 0.5);

Collection of size 5

NeboStencilCoefCollection<5> coefs5 = build_five_point_coef_collection(0.5, 0.5, 0.5, 0.5, 0.5);

Creating a coefficient collection larger than five points can be done by “growing” a collection through application. For example, the above NeboStencilCoefCollection object, called coefs, for our gradient stencil could be defined as:

NeboStencilCoefCollection<2> coefs = build_coef_collection(-1.0/dx)(1.0/dx);

Creating a collection with six coefficients can be either

Using build_five_point_coef_collection function

NeboStencilCoefCollection<6> sixCoefs = build_five_point_coef_collection(0.1, 0.2, 0.3, 0.4, 0.5)(0.6);

or

Using build_coef_collection function

NeboStencilCoefCollection<6> sixCoefs = build_coef_collection(0.1)(0.2)(0.3)(0.4)(0.5)(0.6);

Defining the Type of a New Stencil

There are several types/styles of Nebo stencil. Each one defines a different calculation, and most take different template arguments. Briefly, the styles are:

Stencil Builder Name Description Template Arguments Runtime Arguments
Nebo stencil builder The most common Nebo stencil. It uses coefficients. Operator Type, Stencil Point Collection, Source Type, Destination Type Coefficient Collection
Nebo edgeless stencil builder The same as the common Nebo stencil, except it does not invalidate ghost cells. This is useful for boundary conditions and conditional application of stencils. Operator Type, Stencil Point Collection, Source Type, Destination Type Coefficient Collection
Nebo boundary condition builder Takes another stencil operator and inverts it along a boundary. Thus, this operator writes to an index point that is traditionally read from. Other Stencil Type Other Stencil Object/Instance
Nebo sum stencil builder Adds up the values at each stencil point and writes the sum to the destination point. It does not use coefficients. Stencil Point Collection, Source Type, Destination Type None
Nebo average stencil builder Adds up the values at each point and then writes the average to the destination point. It does not use coefficients. Stencil Point Collection, Source Type, Destination Type None
The Nebo mask shift builder Takes only one point, operates on masks only, and converts from one mask type to another by simply shifting the mask. It does not use coefficients. Operator Type, Source Type, Destination Type None

Template Parameter Explanations

Operator Type: This type is used to describe the operator at a high level. For example, Gradient, Divergence, and Interpolant (include links to definitions) are the most common Operator Types. In general, Nebo does nothing with this template parameter, but it is often used for type inference.
Stencil Point Collection: This type is simply a StencilPointCollection (link) as explained above in Section Defining Point Collections (link).
Source Type: This type is a field type that the stencil should expect as input.
Destination Type: This type is a field type that the stencil will return as output.
Other Stencil Type: Nebo Boundary Conditions take a pre-existing stencil and invert it. Thus, when building a Nebo Boundary Condition, another stencil must be supplied.

Runtime Parameter Explanations

Coefficient Collection: This argument is simply a CoefficientCollection object with the same number of coefficients as there are stencil points in the Stencil Point Collection. A CoefficientCollection object is built as explained above in Section Defining Coefficient Collections (link).
Other Stencil Object/Instance: This argument is the actual stencil object to invert in a Nebo Boundary Condition.

Our gradient stencil operator is a common Nebo stencil with coefficients, so we use NeboStencilBuilder:

typedef NeboStencilBuilder<Gradient, FullCollection, SVolField, SSurfXField> GradX;

Creating a Stencil Object

To create an instance of a stencil, simply provide the proper runtime arguments to the constructor. For our example of GradX, this becomes:

GradX gradX(coefs);

where coefs is the coefficient collection we defined above.

Using stencils

The stencil object itself is a functor, meaning it can be used like a function:

SVolField f;
//…
g <<= gradX(f);

See section (link) on using predefined stencils for more detail.