2 code implementations • 18 Nov 2016 • Ansel L. Blumers, Yu-Hang Tang, Zhen Li, Xuejin Li, George E. Karniadakis
We observe a speedup of 10. 1 on one GPU over all 16 cores within a single node, and a weak scaling efficiency of 91% across 256 nodes.
Computational Physics Biological Physics