Note: Work in progress. Highlighted items need links or other fixing.
Since a mesh is simply stored as an array of data, we can abstract which device(s) that data resides in by using a single “field.” For example, a field representing temperature may be stored on the CPU, a GPU, or both. Alternatively, on a system with multiple GPUs, the temperature field could be on any subset of the available GPUs.
Nebo reads or modifies data on a single device, marked as active, while other devices are designated as either valid or invalid according to whether their data is current with that of the active device.
Each field has exactly one device marked as active. This is the device upon which the data can currently be read and/or modified–unless the field is constant, in which case the data can only be read. If any modifications are made on the active device, all other devices containing the data will be flagged as invalid.
Because data can reside on multiple devices, we may need to perform transfers between those devices. To keep track of the integrity of that data, memory on each device can be flagged as valid, or invalid.
Memory on a device flagged as invalid contains outdated data; therefore, it cannot be read or modified. To make an invalid device valid again, the data must be copied from the active device to the invalid device. The functions in [Figure Number (below)] which can perform this copy will update the flag.
Memory on a device flagged as valid can be read but not modified (unless it is designated as the active device). Any device marked as valid can be designated as the active device.
When passing a device into a function, the CPU and first GPU may be represented by either a name or an index number. For example,
f.add_device(0) are equivalent. Additional GPUs must be represented as an index number of N, assuming a base of 0. This matches NVidia’s device indices for the GPUs.
Whenever a field is created, either by a constructor or a call to the SpatialFieldStore, the new field has an initial device. The memory for the default device is flagged as active. Thus, it can be read or modified at will; unless, of course, the field is constant, in which case the memory can only be read.
A field can be created, either by a constructor or a call to the spatial field store, without explicitly passing in a device. In these instances, the device will generally default to the CPU.
Example: Field Initialization with Default Device Using an SVolField
These values are assumed for all of the examples following them.MemoryWindow mw = …;BoundaryCellInfo bc = …;GhostData ghosts = …;
Using the constructor (This will default to the CPU)SVolField g(mw, bc, ghosts);
Using the constructor with externally allocated storage (This assumes the given pointer (ptrToMemory) is on the CPU)SVolField g(mw, bc, ghosts, ptrToMemory, ExternalStorage);
Using the spatial field store (This will default to the CPU)SpatFldPtr<SVolField> g = SpatialFieldStore::get_from_window(mw, bc, ghosts);
Initializing a field with the spatial field store while using a prototype will default to the device currently flagged as active:
Using a field with the SpatialFieldStore (This will default to the active device of f)SpatFldPtr<SVolField> g = SpatialFieldStore::get<SVolField, SVolField>(f);
To initialize a field on a specific device, some additional arguments must be passed in to the constructor.
Example: Initializing a Field with an Explicit Device
These values are assumed for all of the examples following them.MemoryWindow mw = …;BoundaryCellInfo bc = …;GhostData ghosts = …;
By constructor (Explicitly CPU)SVolField g(mw, bc, ghosts, NULL, InternalStorage, CPU_INDEX);
By constructor (Explicitly GPU 0)SVolField g(mw, bc, ghosts, NULL, InternalStorage, GPU_INDEX);
By constructor with externally allocated memory (Explicitly CPU)SVolField g(mw, bc, ghosts, ptrToMemory, ExternalStorage, CPU_INDEX);
By constructor with externally allocated memory (Explicitly GPU)SVolField g(mw, bc, ghosts, ptrToMemory, ExternalStorage, GPU_INDEX);
By SpatialFieldStore (Explicitly CPU)SpatFldPtr<SVolField> g = SpatialFieldStore::get_from_window(mw, bc, ghosts, CPU_INDEX);
By SpatialFieldStore (Explicitly GPU 0)SpatFldPtr<SVolField> g = SpatialFieldStore::get_from_window(mw, bc, ghosts, GPU_INDEX);
By SpatialFieldStore with prototype (Explicitly CPU, regardless of f’s active device)SpatFldPtr<SVolField> g = SpatialFieldStore::get<SVolField, SVolField>(f, CPU_INDEX);
By SpatialFieldStore with prototype (Explicitly GPU 0, regardless of f’s active device)SpatFldPtr<SVolField> g = SpatialFieldStore::get<SVolField, SVolField>(f, GPU_INDEX);
To add an additional device to a field, use the SpatialOps::SpatialField::add_device() member function on the field and pass in the index of the device to be added:
add_device, the memory on the added device is available, up-to-date with the memory on the active device, and flagged as valid.
If the field already has memory on the specified device, then
add_device will not allocate any additional memory. It will, however, validate the current memory if it is not already valid.
The add_device member function is synchronous; that is, it will finish validating the device before returning. Because this can be a long operation and the device may not be needed immediately, we also provide an asynchronous method to add other devices, use the SpatialOps::SpatialField::add_device_async() method:
To validate a device on a field, call SpatialOps::SpatialField::validate_device() on the field and pass in the device index of the device to be validated.
Validating a device involves a data transfer between the field's active device and another added device. Only one device can be validated at a time. Because the data transfer can be a very costly operation, it is recommended to validate devices early and use the asynchronous version whenever possible.
Similar to the add_device function,
validate_device is synchronous. For asynchronous execution, we provide an asynchronous version of
To set a device on a field as active, call the SpatialOps::SpatialField::set_device_as_active() member function and pass in the device index of the device to be made active.
There is a restriction that for a device to be made active, the device memory must already be valid. If the device memory is not valid,
set_device_as_active will call
add_device. Thus if the device is not already valid (or even allocated),
set_device_as_active will be an expensive call.
Similar to the
set_device_as_active is synchronous. For asynchronous execution, we provide SpatialOps::SpatialField::set_device_as_active(), an asynchronous version of
Single value fields are essentially scalars that behave as fields. This means that they can be allocated on different devices, and they can use the same model as described above to be used on multiple devices.
Example: Initializing a SingleValueField with an Explicit Device
These values are assumed for all of the examples following them.MemoryWindow mw(IntVec(1,1,1));BoundaryCellInfo bc(false, false, false);GhostData ghosts(0);
By constructor (Explicitly CPU)SingleValueField g(mw, bc, ghosts, NULL, InternalStorage, CPU_INDEX);
By constructor (Explicitly GPU 0)SingleValueField g(mw, bc, ghosts, NULL, InternalStorage, GPU_INDEX);
This section uses the new convention currently in this branch: new-stencil-convention
A stencil operator applies a specific field operation while converting a field from one type into another. Some commonly used stencil operations in Nebo include gradient, divergence, and interpolation. A complete list of stencils defined in SpatialOps can be found in spatialops/structured/stencil/StencilBuilder.cpp. Other projects that use Nebo/SpatialOps often define more; see each project’s documentation for more details.
The following code provides a quick sequential example of how to create and use a stencil to calculate a gradient in Nebo. Each step is explained in detail in the following sections.
This section assumes an understanding of how to use Pre-defined Nebo Stencils [Link here to Nebo Expressions: Pre-defined Stencils].
To create a new stencil, the following items must be determined:
As an example, we will build a gradient operator, which calculates the difference between the cells on either side of the destination point, weighted by the distance between them. The source field of this operator will be SVolField (scalar volume field) and the destination field will be SSurfXField (X-face surfaces between scalar volume cells). The shape of this operator is a 3-wide one-dimensional stencil: it will read the neighbors in the X direction to the “left” and “right” of the destination point. Finally, it will calculate the difference between the values of its neighbors, so the coefficients will be the inverse of the change in the X direction (dx).
Each stencil point needs to be defined in terms of distance from the destination point. Thus, if the stencil point and the destination point share the same index, the stencil point will be 0,0,0. If the stencil point is one cell to the left of the destination point (in the X direction), the stencil point will be -1,0,0. If the stencil point is one cell above the destination point (in the Y direction), the stencil point will be 0,1,0. If the stencil point is two cells to the left of the destination point (in the X direction), the stencil point will be -2,0,0. There is no limit to the distance that can be between the stencil point and the destination point.
When creating a stencil from one field type to a different field type with a different offset [link to documentation on field types and their offsets], the origin stencil point (0,0,0) cannot be used. The origin stencil point can only be used when creating stencils of the same field type or of types with the same offset.
To construct a stencil point in code, one uses an index triplet. An index triplet is simply a type with three integer template arguments, one for each axis. The offsets are exactly what was determined for the index references as in the paragraph above.
For our gradient operator, we have two stencil points, which are one cell away to either side of the destination point, along the X direction. These points are -1,0,0 and 1,0,0. The index triplets are:
Once the stencil points are defined, they need to be gathered into a single type/collection. A Nebo stencil point collection is a list of stencil points (a list of index triplets). Here, the order of the stencil points does matter, for it will change the rounding error of the final result.
To create the point collection for our gradient stencil, we need to combine MinusPoint and PlusPoint into a single collection. For this example, we will create a collection of a single point (PlusPoint) first, and then add MinusPoint; and for the sake of clarity, we will do this creation and addition in separate type defintions:
Alternatively, we can define these in a single line:
NEBO_FIRST_POINT and NEBO_ADD_POINT are C++ preprocessor macros, which construct a list of points. The definitions of these macros shows how to construct this list by hand. The details of manually constructing stencil point collections is left out of this section to avoid confusion.
For stencils that need coefficients at each point, the coefficients must be packaged in a Nebo Stencil Coefficient Collection. The order of coefficients in the Nebo Stencil Coefficient Collection must match the order of stencil offset points in the Nebo stencil point collection exactly.
Back to the example, our gradient stencil can be calculated with coefficients. (The gradient stencil can be calculated without stencils as explained below.) Since we want to average our two stencil points together, the coefficients will both be .5. For convenience, we will use the build_two_point_coef_collection function to build the coefficient collection:
The scalar/double value “dx” is assumed to be defined/known at this point. Because we want the difference between the values of our two stencil points, one must be negated; by convention, we negate the first. If both coefficients were positive, we would calculate the sum of the stencil points rather than the difference.
In addition to build_two_point_coef_collection(), there are four other related functions that build coefficient collections ranging in size from one coefficient to five coefficients:
Collection of size 1NeboStencilCoefCollection<1> coef1 = build_coef_collection(0.5);
Collection of size 2NeboStencilCoefCollection<2> coefs2 = build_two_point_coef_collection(0.5, 0.5);
Collection of size 3NeboStencilCoefCollection<3> coefs3 = build_three_point_coef_collection(0.5, 0.5, 0.5);
Collection of size 4NeboStencilCoefCollection<4> coefs4 = build_four_point_coef_collection(0.5, 0.5, 0.5, 0.5);
Collection of size 5NeboStencilCoefCollection<5> coefs5 = build_five_point_coef_collection(0.5, 0.5, 0.5, 0.5, 0.5);
Creating a coefficient collection larger than five points can be done by “growing” a collection through application. For example, the above NeboStencilCoefCollection object, called coefs, for our gradient stencil could be defined as:
Creating a collection with six coefficients can be either
There are several types/styles of Nebo stencil. Each one defines a different calculation, and most take different template arguments. Briefly, the styles are:
|Stencil Builder Name||Description||Template Arguments||Runtime Arguments|
|Nebo stencil builder||The most common Nebo stencil. It uses coefficients.||Operator Type, Stencil Point Collection, Source Type, Destination Type||Coefficient Collection|
|Nebo edgeless stencil builder||The same as the common Nebo stencil, except it does not invalidate ghost cells. This is useful for boundary conditions and conditional application of stencils.||Operator Type, Stencil Point Collection, Source Type, Destination Type||Coefficient Collection|
|Nebo boundary condition builder||Takes another stencil operator and inverts it along a boundary. Thus, this operator writes to an index point that is traditionally read from.||Other Stencil Type||Other Stencil Object/Instance|
|Nebo sum stencil builder||Adds up the values at each stencil point and writes the sum to the destination point. It does not use coefficients.||Stencil Point Collection, Source Type, Destination Type||None|
|Nebo average stencil builder||Adds up the values at each point and then writes the average to the destination point. It does not use coefficients.||Stencil Point Collection, Source Type, Destination Type||None|
|The Nebo mask shift builder||Takes only one point, operates on masks only, and converts from one mask type to another by simply shifting the mask. It does not use coefficients.||Operator Type, Source Type, Destination Type||None|
Our gradient stencil operator is a common Nebo stencil with coefficients, so we use NeboStencilBuilder:
To create an instance of a stencil, simply provide the proper runtime arguments to the constructor. For our example of GradX, this becomes:
where coefs is the coefficient collection we defined above.
The stencil object itself is a functor, meaning it can be used like a function:
See section (link) on using predefined stencils for more detail.