Christian Dick, Joachim Georgii, Rüdiger Westermann
Computer Graphics and Visualization Group, Technische Universität München, Germany
We present a multigrid approach for simulating elastic deformable objects in real time on recent NVIDIA GPU architectures. To accurately simulate large deformations we consider the co-rotated strain formulation. Our method is based on a finite element discretization of the deformable object using hexahedra. It draws upon recent work on multigrid schemes for the efficient numerical solution of partial differential equations on such discretizations. Due to the regular shape of the numerical stencil induced by the hexahedral regime, and since we use matrix-free formulations of all multigrid steps, computations and data layout can be restructured to avoid execution divergence of parallel running threads and to enable coalescing of memory accesses into single memory transactions. This enables to effectively exploit the GPU's parallel processing units and high memory bandwidth via the CUDA parallel programming API. We demonstrate performance gains of up to a factor of 27 and 4 compared to a highly optimized CPU implementation on a single CPU core and 8 CPU cores, respectively. For hexahedral models consisting of as many as 269,000 elements our approach achieves physics-based simulation at 11 time steps per second.
The first author is funded by the International Graduate School of Science and Engineering (IGSSE) of the Technische Universität München.
A Real-Time Multigrid Finite Hexahedra Method for Elasticity Simulation using CUDA
C. Dick, J. Georgii, R. Westermann,
Simulation Modelling Practice and Theory 19(2):801-816, 2011
Left: A deformable object consisting of 31,000 hexahedral finite elements (119,000 DOF) is shown. The CUDA-based simulation runs entirely on the GPU at 66 time steps per second.
Right: To render a visually smooth surface of the object, a high-resolution mesh is bound to the finite elements.
Deformable objects consisting of 30,000 (left) and 28,000 (right) hexahedral finite elements (112,000 and 98,000 DOF). The CUDA-based simulation runs entirely on the GPU at 69 and 74 time steps per second, respectively.
Simulation time steps per second (left) and speed-ups (right) achieved on the Fermi GPU and on the CPU using 1, 2, 4, and 8 cores for models of different size as well as single and double floating point precision. Each time step includes the re-assembly of the system of equations due to the co-rotated strain formulation as well as two multigrid V-cycles, each with 2 pre- and 1 post-smoothing Gauss-Seidel steps. The speed-ups are measured with respect to 1 CPU core.
Convergence behavior of our multigrid solver (red curves) with respect to computing time (using 1 CPU core) for the Stanford bunny model consisting of 33,000 (left) and 269,000 (right) hexahedral finite elements. For comparison, the convergence behavior of a conjugate gradient solver with Jacobi pre-conditioner (green curves) is included.