CUDA example particles

This is a code review of particle simulation CUDA template . Several changes are also documented afterwards.

Main references

Chapter 29. Real-Time Rigid Body Simulation on GPUs^[1]

This is a Real-Time Particle Simulation implemented in CUDA. Discrete element method(DEM)^[2] is implemented with uniform grid data structure. Real-time granular materials rendering is applied by openGL.

This framework can be extended to more complicated problems and methods such as SPH^[3]^[4] model for fluid.

The uniform grid^[5] technique could be extended to more sophisticated structures such as hierarchical grids.

Code structure

Directory tree

The code directory is rather flat, all the necessaries are located in the root,

.
├── findgllib.mk
├── Makefile
├── NsightEclipse.xml
├── particles.cpp
├── particles_kernel.cuh
├── particles_kernel_impl.cuh
├── particleSystem.cpp
├── particleSystem_cuda.cu
├── particleSystem.cuh
├── particleSystem.h
├── readme.txt
├── render_particles.cpp
├── render_particles.h
├── shaders.cpp
└── shaders.h

Main Structure Overview

The source file particles.cpp can be better understood with the workflow below that includes the primary functions. Note that only rendering mode branch is presented i.e. !BenchMark == true. The conditioning statements are colored with navy blue. And the functions needed for attention are colored with dark red.

class: particleSystem

This is the main class for particles, including set particle position, updating... The DEM^[3] method is implemented in the collideSpheres function.

Update particles

There are three main steps to perform the updating: Integration, Building the grid data structure and Processing collisions, which are illustrated in Particle Simulation using CUDA - NVIDIA. Note for the grid data structure, sorting is used instead of the atomic operations.

Extensions

After understanding the code structure, there is no reason for not adding some new features:

Active gravity mode

One of benefits of the real-time rendering is the ability of interaction. In active gravity mode, the gravity rotates at the opposite direction against the camera, as shown below.

Code

step 1: setGravity3d

original

In original code, the gravity direction is fixed (norm: \([0,-1,0]\)), the global variable gravity is defined as scalar (float):

float gravity = 0.0003f;

the function to set gravity is defined in class ParticleSystem as:

void setGravity(float x)
{
    m_params.gravity = make_float3(0.0f, x, 0.0f);
}

when use it:

psystem->setGravity(-gravity)

update

The good news is that the class variable gravity is defined as float3. we only need to upgraded the class function to be

void setGravity3d(float x, float y, float z)
{
    m_params.gravity = make_float3(x, y, z);
}

and the global variable and its usage:

float gravity3d[] = {0, -0.001, 0};

psystem->setGravity3d(gravity3d[0], gravity3d[1], gravity3d[2]);

step 2: rotateGravity

original

Recall the Code Structure, the camera rotation is applied in glutDisplayFunc(display). The code (in function display()) is such as:

for (int c = 0; c < 3; ++c)
{
    camera_trans_lag[c] += (camera_trans[c] - camera_trans_lag[c]) * inertia;
    camera_rot_lag[c] += (camera_rot[c] - camera_rot_lag[c]) * inertia;
}

glTranslatef(camera_trans_lag[0], camera_trans_lag[1], camera_trans_lag[2]);
glRotatef(camera_rot_lag[0], 1.0, 0.0, 0.0);
glRotatef(camera_rot_lag[1], 0.0, 1.0, 0.0);

The camera rotation is applied by glRotatef(camera_rot_lag[0, 1]), where the camera normal vector is rotated from \([0,1,0]\) every time step.

update

Define new global variables gravity3dOrg and rotateGravity

float gravity3dOrg[] = {0, -0.001, 0};
bool rotateGravity = false;

Add code to the display(), after the sampled original code above as

if (rotateGravity)
{
    // rotate gravity if camera rotate
    float angleX = -M_PI*camera_rot_lag[0]/180.0;
    float angleY = -M_PI*camera_rot_lag[1]/180.0;
    float temp[3] = {0,0,0};
    rotateVectorX(gravityOrg, temp, angleX);
    gravity3d[1] = 0.0;
    rotateVectorY(temp, gravity3d, angleY);

    // // test code: print point data
    // printf("%.2f %.2f\t",camera_rot_lag[0], camera_rot_lag[1]);
    // for (int i=0; i<3; i++)
    // {
    //     printf("%.2f ",1000*gravity3d[i]);
    // }
    // printf("\t%.2f\n",1000*sqrt(gravity3d[0]*gravity3d[0]+gravity3d[1]*gravity3d[1]+gravity3d[2]*gravity3d[2]));
}

Define functions rotateVectorX and rotateVectorY in a new file vectorRotate.hpp

#include <math.h>

//function to calculate rotate vectors
template <class T2>
void rotateVectorX(T2 *vector_a, T2 *vector_rotated, T2 &angle)
{
    vector_rotated[1] = vector_a[1] * cos(angle);
    vector_rotated[2] = vector_a[1] * sin(angle);
}

template <class T2>
void rotateVectorY(T2 *vector_a, T2 *vector_rotated, T2 &angle)
{
    vector_rotated[0] = vector_a[2] * sin(angle);
    vector_rotated[1] = vector_a[1];
    vector_rotated[2] = vector_a[2] * cos(angle);
}

step3: toggle rotateGravity

Switches can be toggled by keyboard in glutKeyboardFunc(key). Add another case in key() as:

      case 'w':
          wireframe = !wireframe;
          break;

      case 'h':
          displaySliders = !displaySliders;
          break;    
	
// new added
      case 'f':
          rotateGravity = !rotateGravity;
          break;

In order to add this option to the right button menu, go to initMenus() and add afterwards:

glutAddMenuEntry("Toggle sliders [h]", 'h');
glutAddMenuEntry("Toggle active gravity (f)", 'f');
glutAttachMenu(GLUT_RIGHT_BUTTON);

Algorithm

The only algorithm related is vector rotation. Mimicking the glRotatef, two-step algorithm is adopted.

step 1: rotate original gravity \([0,-0.001,0]\) around the 1^st dimension (\(x\) axis) by \(angleX\)
step 2: then rotate resulting gravity vector toward 2^st dimension (\(y\) axis) by \(angleY\)

step 1

Recall the vector rotation matrix, see: wikipedia on rotation matrices

With vector \(V_1(x_1,y_1,z_1)\) rotates by angle \(\beta\), the resulting vector \(V_2(x_2,y_2,z_2)\): \[ V_2 = \mathbf{R} V_1 \] when the rotation axis is \(x\), as in step 1, the rotation matrix: \[ \mathbf{R_x} = \left[ {\begin{array}{ccc} 1 & 0 & 0 \\ 0 &\cos\beta & -\sin\beta \\ 0 &\sin\beta & \cos\beta \end{array} } \right] \] that is to say: \[ \begin{aligned} x_2 &= x_1 \\ y_2 &= \cos\beta y_1 - \sin\beta z_1 \\ z_2 &= \sin\beta y_1 + \cos\beta z_1 \\ \end{aligned} \] the code is:

vector_rotated[0] = vector_a[0];
vector_rotated[1] = vector_a[1] * cos(angle) - vector_a[2] * sin(angle);
vector_rotated[2] = vector_a[1] * sin(angle) + vector_a[2] * cos(angle);

In this case, \(V_1\) aligns to the negative y axis i.e. vector_a[0] = vector_a[2] = 0. So the code is simplified as shown above.

step 2

the rotation matrix around \(y\) axis is: \[ \mathbf{R_{y}}= \begin{bmatrix} \cos\beta & 0 & \sin\beta\\ 0 & 1 & 0\\ -\sin\beta & 0 & \cos\beta \end{bmatrix} \] in code that is:

vector_rotated[0] = vector_a[0] * cos(angle) + vector_a[2] * sin(angle);
vector_rotated[1] = vector_a[1];
vector_rotated[2] = -vector_a[0] * sin(angle) + vector_a[2] * cos(angle);

in this case, vector_a[0] = 0, the code is thus simplified

Periodic bottom boundary

Instead pf Dirichlet boundary condition on wall, it possible to impose another type of boundary condition. In this example, when the particle y position exceeds the bottom, the position will reset to the top, and the velocity remains.

code

step 1: periodic boundary

original

Recall the Code Structure, the boundary condition is applied in integrate_functor. The bottom boundary is defined as rigid:

if (pos.y < -1.0f + params.particleRadius)
{
    pos.y = -1.0f + params.particleRadius;
    vel.y *= params.boundaryDamping;
}

update

Define a extra conditional statement using a new parameter

if (params.rigidBottom)
{
    // origial rigid boundary
    if (pos.y < -1.0f + params.particleRadius)
    {
    pos.y = -1.0f + params.particleRadius;
    vel.y *= params.boundaryDamping;
    }
}
else
{
    // periodic boundary
    if (pos.y < -1.0f + params.particleRadius)
    {
        pos.y = 1.0f - params.particleRadius;
    }
}

params.rigidBottom should to be add the the parameter list.

step 2: parameters

original

params is the type of SimParams defined in particles_kernel.cuh

// simulation parameters
struct SimParams
{ 
    float3 colliderPos;
    float  colliderRadius;

    float3 gravity;
	//...
};

SimParams m_params is a parameter of class ParticleSystem. It is initialized in constructor ParticleSystem::ParticleSystem such as:

ParticleSystem::ParticleSystem(uint numParticles, uint3 gridSize, bool bUseOpenGL):
    m_bUseOpenGL(bUseOpenGL),
    m_numParticles(numParticles),
	// ...
    m_gridSize(gridSize),
{
	// ...
    // set simulation parameters
    m_params.gridSize = m_gridSize;
    //...
}

update

Mimicking the gridSize parameter,

add new global parameter bool rigidBottom = true; in particles.cpp

define new member parameter bool m_rigidBottom in class ParticleSystem

add new input parameter and initialization list of constructor function as

ParticleSystem::ParticleSystem(uint numParticles, uint3 gridSize, bool bUseOpenGL, bool rigidBottom) :
    //...
	m_rigidBottom(rigidBottom),
    m_gridSize(gridSize),
	//...
{
    //...
    // set simulation parameters
    m_params.rigidBottom = m_rigidBottom;
    m_params.gridSize = m_gridSize;
    //...
}

There is another way to change parameters, suitable for non-constant parameters.

For example define a function and call it every step in display like:

void setGravity3d(float x, float y, float z)
{
    m_params.gravity = make_float3(x, y, z);
}

void display()
{    
	if (!bPause)
    {
        psystem->setIterations(iterations);
        psystem->setDamping(damping);
        psystem->setGravity3d(gravity3d[0], gravity3d[1], gravity3d[2]);
        psystem->setCollideSpring(collideSpring);
        //...
    }
    //...
}

Directory house keeping

This is only a update to the directory tree for clearer developing afterwards.

The updated directory tree after make (with updated Makefile) is shown as,

.
├── bin
│   └── x86_64
│       └── linux
│           └── release
│               └── particles // exe
├── data
│   └── ref_particles.bin
├── doc
│   ├── FULLTEXT01.pdf
│   ├── Novel\ hierarchical\ strategies\ for\ SPH-centric\ algorithms\ on\ GPGPU.pdf
│   ├── particles.pdf
│   ├── periodic_bottom.mp4
│   ├── R.W\ Hockney,\ J.W\ Eastwood\ -\ Computer\ simulation\ using\ particles-A.\ Hilger\ (1988).pdf
│   ├── screenshot_lg.png
│   ├── screenshot_md.png
│   ├── screenshot_sm.png
│   └── SPH\ on\ GPU\ with\ CUDA.pdf
├── inc
│   ├── // contain cuda normal inc in SDK
│   └── rendercheck_gl.h
├── make
│   ├── findgllib.mk
│   ├── Makefile
│   ├── particles
│   ├── particles.o
│   ├── particleSystem_cuda.o
│   ├── particleSystem.o
│   ├── render_particles.o
│   └── shaders.o
├── readme.txt
└── src // source
    ├── NsightEclipse.xml
    ├── particles.cpp
    ├── particles_kernel.cuh
    ├── particles_kernel_impl.cuh
    ├── particleSystem.cpp
    ├── particleSystem_cuda.cu
    ├── particleSystem.cuh
    ├── particleSystem.h
    ├── render_particles.cpp
    ├── render_particles.h
    ├── shaders.cpp
    ├── shaders.h
    └── vectorRotate.hpp

Debug on reset

original

When calling psystem -> reset, the particle number gridSize in each side of block is defined by

uint s = (int) ceilf(powf((float) m_numParticles, 1.0f / 3.0f));

uint gridSize[3];
gridSize[0] = gridSize[1] = gridSize[2] = s;
initGrid(gridSize, m_params.particleRadius*2.0f, jitter, m_numParticles);

But it shows problematic when assigning particles number \(64*64*64=262144\).

test

Writing test code:

float jitter = m_params.particleRadius*0.01f;
// uint s = (int) ceilf(powf((float) m_numParticles, 1.0f / 3.0f));

// <<< test ceiling bug>>>
// split this compacted line in two steps and test on each
// step 1, calulate cubic root and print the result
float temp = powf((float) m_numParticles, 1.0f / 3.0f);
std::cout.precision(20);
std::cout << temp << std::endl;
// step 2, ceil to int and print the result
uint s = (int) ceilf(temp);
std::cout << s << std::endl;
// <<< endtest >>>

uint gridSize[3];
gridSize[0] = gridSize[1] = gridSize[2] = s;
initGrid(gridSize, m_params.particleRadius*2.0f, jitter, m_numParticles);

make and run with:

$./particles n=262144
CUDA Particles Simulation Starting...

grid: 64 x 64 x 64 = 262144 cells
particles: 262144
GPU Device 0: "Ampere" with compute capability 8.6

64.00000762939453125
65

update

The bug is because the low precision of float operations. Change the second step into below to larger the truncation error of the ceil command.

// step 2, ceil to int and print the result
// uint s = (int) ceilf(temp);
uint s = (int) ceilf(floor(temp*1000)/1000);

Problem solved!

References

Harada, T.: Real-Time Rigid Body Simulation on GPUs. GPU Gems 3. Addison Wesley, 2007 ↩︎
Mishra, B. K. 2003. A Review of Computer Simulation of Tumbling Mills by the Discrete Element Method: Part 1—Contact Mechanics. International Journal of Mineral Processing 71(1), pp. 73–93. ↩︎
Monaghan J.: Smoothed particle hydrodynamics. Annu. Rev. Astron. Physics 30 (1992), 543. 12, 13 ↩︎
Müller M., Charypar D., Gross M.: Particle-based fluid simulation for interactive applications. Proceedings of 2003 ACM SIGGRAPH Symposium on Computer Animation (2003), 154–159. ↩︎
Ericson, C., Real-Time Collision Detection, Morgan Kaufmann 2005 ↩︎

#CUDA

CUDA example particles

https://daydreamatnight.github.io/2022/09/15/CUDA-example-particles/

Author

Ryan LI

Posted on

September 15, 2022

Licensed under

hand written SPH method Previous

CUDA Performance Next